arxiv: 2604.08891 · v1 · submitted 2026-04-10 · 💻 cs.LG

Recognition: unknown

Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization

Donney Fan , Geoff Pleiss

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:10 UTC · model grok-4.3

classification 💻 cs.LG

keywords Bayesian optimizationThompson samplinghigh-dimensional optimizationcandidate pointsGaussian processesadaptive samplingsurrogate modelsmaximization

0 comments

The pith

Adaptive Candidate Thompson Sampling improves high-dimensional Bayesian optimization by generating candidates in gradient-guided subspaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard Thompson sampling for Bayesian optimization struggles in high dimensions because fixed sets of candidate points become too sparse to reliably capture the posterior over the maximizer of a Gaussian process surrogate. ACTS addresses this by drawing a surrogate function, computing its gradient, and restricting new candidate points to the subspace aligned with that gradient before selecting the next evaluation point. This adaptive reduction keeps the sampling tractable while increasing the chance of hitting better maxima. A reader would care because the change is presented as a lightweight swap-in for existing Thompson sampling pipelines, including those already using trust regions, and it reports gains on both synthetic test functions and real tasks.

Core claim

ACTS generates candidate points for Thompson sampling by restricting the search to subspaces aligned with the gradient of a draw from the surrogate posterior, thereby increasing candidate density in regions that are likely to contain high-value points without requiring a new global GP approximation.

What carries the argument

Adaptive Candidate Thompson Sampling (ACTS), which adaptively reduces the search space by generating candidates in subspaces guided by the gradient of a surrogate model sample.

If this is right

Produces higher-quality samples of the objective maximizer than fixed-candidate Thompson sampling.
Delivers improved optimization performance on synthetic and real-world benchmarks.
Functions as a drop-in replacement that combines with existing trust-region or local-approximation variants of Thompson sampling.
Mitigates the exponential sparsity of candidate sets as input dimension grows.
Maintains compatibility with any Gaussian-process surrogate without altering its training or inference routine.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gradient-guided subspace idea could be applied to other posterior-sampling acquisition functions beyond pure Thompson sampling.
In extremely high dimensions the method may still require an outer dimensionality-reduction step to keep gradient estimation stable.
Because the subspace choice depends on a single sample, repeated independent runs of ACTS on the same problem could be used to quantify uncertainty in the selected region.
The approach suggests a general design pattern: replace static discretization with cheap, model-derived local structure whenever sampling from an intractable posterior.

Load-bearing premise

Subspaces chosen from the gradient of a surrogate sample will reliably contain better maximizer locations without systematically missing the global optimum or introducing bias.

What would settle it

A high-dimensional benchmark function where running ACTS repeatedly yields lower final values or fails to locate a known global maximum compared with standard fixed-candidate Thompson sampling.

Figures

Figures reproduced from arXiv: 2604.08891 by Donney Fan, Geoff Pleiss.

**Figure 1.** Figure 1: Illustration of Adaptive Candidate Thompson Sampling. A GP posterior sample (background colour) can only be evaluated at a finite set of candidate points. (Left.) Standard candidate point methods produce a poor discretization of the input domain. (Center.) ACTS only produces candidate points in a subspace that is aligned with the gradient of the posterior sample, increasing density in a region where a (loc… view at source ↗

**Figure 2.** Figure 2: Optimization performance of mediumdimensional real-world optimization problems. ACTS consistently exhibits top performance (within two standard errors) and achieves high objective values earlier in optimization than other methods. 2018) to efficiently compute the marginal log-likelihood. Each run is warm-started with 30 initial points from a Sobol sequence. Results are averaged over 10 runs with plots sho… view at source ↗

**Figure 3.** Figure 3: Optimization performance on several high-dimensional problems. ACTS ranks highly in all benchmarks, achieving top performance (within two standard errors) on all benchmarks, and outperforming RAASP (with significance) on MOPTA08, SVM, and Median Molecules 1. (Note that TuRBO does not consistently improve performance on all benchmarks, but the performance boost/regression it induces is consistent across all… view at source ↗

**Figure 4.** Figure 4: Batch optimization performance (q = 100) of selected high-dimensional problems. ACTS achieves top results (within two standard errors) on all benchmarks. 0 200 400 600 Iteration 10−10 10−60 10−110 10−160 10 Region Volume −210 ACTS search space TuRBO trust region ACTS w/ TuRBO trust region [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: ACTS increases candidate point density by shrinking search spaces. ACTS exhibits volumes similar to TuRBO without the direct use of trust regions in two BO runs of Rover, with and without TuRBO, up to the first restart. Search Behaviour. ACTS substantially reduces the search space, which enables a high density of candidate points in promising regions. While this might suggest a tendency toward local search… view at source ↗

**Figure 7.** Figure 7: Optimization performance of Sobol and RAASP policies with and without ACTS on high-dimensional objectives. RAASP consistently outperforms the poorly-performing Sobol policy. ACTS search spaces provide a significant performance boost to all methods, especially Sobol. This indicates the ACTS search spaces provide better fidelity for TS maximization than TuRBO trust regions. yields robust performance across … view at source ↗

**Figure 8.** Figure 8: Optimization performance of high-dimensional optimization problems without TuRBO. The TS methods exhibit strong performance compared to several non-TS benchmarks, though notably the LogEI and BAxUS baselines obtain significantly better performance on Median Molecules 2. B EXTENDED OPTIMIZATION RESULTS In Figs. 8 and 9, we provide the full results of the high-dimensional optimization problems. We compare to… view at source ↗

**Figure 9.** Figure 9: Optimization performance of high-dimensional optimization problems with TuRBO. When using TuRBO trust regions, the Thompson sampling methods’ optimization performance can be enhanced. 2.0 2.5 3.0 Average Rank ACTS (1.6) RAASP (1.8) (3.4) Cylindrical (3.2) Pathwise 2.2 2.4 2.6 2.8 Average Rank ACTS-TurBO-1 (2.2) RAASP-TurBO-1 (2.3) (2.9) Cylindrical-TurBO-1 (2.6) Pathwise-TurBO-1 [PITH_FULL_IMAGE:figures/f… view at source ↗

**Figure 10.** Figure 10: Ranking significance of selected TS methods. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Optimization performance of selected high-dimensional benchmarks of small batches (q = 10). ACTS achieves strong performance in all benchmarks compared to other TS methods, though potentially requiring a larger evaluation budget to overcome RAASP in SVM. Batch Diversity. When constructing a batch, it is typically important to have a diverse set of points so that posterior uncertainty is reduced in many lo… view at source ↗

**Figure 12.** Figure 12: Optimization performance of selected high-dimensional benchmarks of medium-sized batches (q = 50). ACTS ranks high among TS methods, with Pathwise finding the best performance in MOPTA08. D GLOBAL CONSISTENCY OF ACTS We show that ACTS will query the global maximizer as the number of evaluations tends to infinity. Theorem 1. Choose any ϵ > 0 and assume the following: A1. k is a non-degenerate kernel with k… view at source ↗

**Figure 13.** Figure 13: Global space filling sequence fail to fill the space of high-dimensional problems. We track the best observed objective value as the number of space filling points increases, up to a budget of 106 points. Aside from low-dimensional problems, the best found point remains far from the (approximate) optimal. 1 2 3 4 Rover (60D) 1 2 3 4 5 MOPTA08 (124D) 1 2 3 4 5 6 7 8 LassoDNA (180D) 2 4 6 8 10 12 SVM (388D)… view at source ↗

**Figure 14.** Figure 14: Mean euclidean distance (± one standard error) from xt to xt−1 averaged over t and 10 runs (arbitrary units). E CURSE OF DIMENSIONALITY OF STANDARD THOMPSON SAMPLING We illustrate the poor approach of “Standard Thompson Sampling” in high-dimensional problems, where such an approach involves using a space-filling sequence to construct the candidate points X˜. Then xt+1 = arg maxx∈X˜ f(x), where f ∼ p(f | … view at source ↗

**Figure 15.** Figure 15: Travelling Salesman Problem tour (± one standard error) for several selected objectives and acquisitions (arbitrary units), averaged over 10 runs. 0 250 500 750 1000 Iteration 100 101 102 Rover (60D) 0 250 500 750 1000 Iteration 10−2 10−1 100 101 102 MOPTA08 (124D) 0 250 500 750 1000 Iteration 10−1 100 101 LassoDNA (180D) 0 250 500 750 1000 Iteration 100 101 102 SVM (388D) Trace of ∇ f(x0) [PITH_FULL_IMA… view at source ↗

**Figure 16.** Figure 16: ACTS reduces the uncertainty appreciably with new observations. We aggregate over 10 runs, where the uncertainty in the gradient is measured by the trace of the covariance of the gradient distribution. G REDUCTION IN GRADIENT UNCERTAINTY As we acquire data over time, especially if we’ve been stuck at x0 for some time, our estimation of the gradient will improve, even without explicit gradient observations… view at source ↗

**Figure 17.** Figure 17: Optimization performance of ACTS when ablating on different search spaces. While the 1D line search spaces provide a comparable level of performance in some cases, it fails to be as performant as the proposed cone search space. I ABLATION STUDY OF ACTS WITH RAASP We ablate on different parameters of the adaptive RAASP-style policy, defined in Eq. 9, where we recall one of the probabilities as P = cγ, wher… view at source ↗

**Figure 18.** Figure 18: Candidate point quality of 1D line subspace. (See Figure 6 to compare). While the observed TS maxes [PITH_FULL_IMAGE:figures/full_fig_p022_18.png] view at source ↗

**Figure 19.** Figure 19: Increased preference to large-magnitude gradient dimensions improves ACTS. Except when using “TopK”, the final performance remains competitive. 0 500 1000 Iteration −10 −5 0 Rover (60D) 0 500 1000 Iteration −0.36 −0.34 −0.32 −0.30 −0.28 LassoDNA (180D) 0 500 1000 Iteration −350 −300 −250 MOPTA08 (124D) 0 500 1000 Iteration −0.20 −0.15 −0.10 −0.05 SVM (388D) 0 2 4 -0.292 -0.290 -0.287 -0.285 -0.282 -225 -2… view at source ↗

**Figure 20.** Figure 20: Increased preference to large-magnitude gradient dimensions improves ACTS. However, the effect is less pronounced when using TuRBO [PITH_FULL_IMAGE:figures/full_fig_p022_20.png] view at source ↗

**Figure 21.** Figure 21: Optimization performance when ablating on the effective number of dimensions perturbed by RAASP. 0 500 1000 Iteration −10 −5 0 Rover (60D) 0 500 1000 Iteration −0.36 −0.34 −0.32 −0.30 −0.28 LassoDNA (180D) 0 500 1000 Iteration −350 −300 −250 MOPTA08 (124D) 0 500 1000 Iteration −0.20 −0.15 −0.10 −0.05 SVM (388D) 0 2 4 -0.288 -0.286 -0.284 -218 -216 -214 -0.12 -0.1 -0.08 -0.06 20 d/10 d Objective Value [PI… view at source ↗

**Figure 22.** Figure 22: Optimization performance when ablating on the effective number of dimensions perturbed by RAASP when using TuRBO [PITH_FULL_IMAGE:figures/full_fig_p023_22.png] view at source ↗

read the original abstract

In Bayesian optimization, Thompson sampling selects the evaluation point by sampling from the posterior distribution over the objective function maximizer. Because this sampling problem is intractable for Gaussian process (GP) surrogates, the posterior distribution is typically restricted to fixed discretizations (i.e., candidate points) that become exponentially sparse as dimensionality increases. While previous works aim to increase candidate point density through scalable GP approximations, our orthogonal approach increases density by adaptively reducing the search space during sampling. Specifically, we introduce Adaptive Candidate Thompson Sampling (ACTS), which generates candidate points in subspaces guided by the gradient of a surrogate model sample. ACTS is a simple drop-in replacement for existing TS methods -- including those that use trust regions or other local approximations -- producing better samples of maxima and improved optimization across synthetic and real-world benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ACTS adds gradient-guided adaptive subspaces to Thompson sampling candidates as a drop-in tweak for high-dim BO, with reported gains on benchmarks but an open question on whether single-sample gradients reliably avoid missing distant modes.

read the letter

The thing to know is that this work takes Thompson sampling for high-dimensional Bayesian optimization and makes the candidate set adaptive by pulling subspaces from the gradient of one draw from the surrogate posterior. It keeps the rest of the pipeline the same, so it slots in with trust-region or other local methods without touching the GP approximation itself. The abstract reports that this produces better maximizer samples and stronger optimization performance on both synthetic and real-world tasks. That is the concrete addition: instead of fighting sparsity by changing the model, they shrink the space on the fly using information already in the posterior sample. If the full experiments control for the usual baselines and show consistent lifts, that is useful incremental progress for anyone running BO in applied settings. The approach is presented as orthogonal to scalable GP work, and nothing in the description suggests the mechanism is already standard in the cited papers. The main soft spot is the risk that a gradient from a single posterior draw can steer the subspace toward a local basin in multimodal landscapes, especially when modes sit farther apart than the subspace size. The paper would need to demonstrate that the overall procedure still reaches competitive or better final values, either through the benchmark suite or through targeted checks on multimodal cases. If those checks are present and the gains survive, the concern stays minor; if they are absent, it becomes a gap worth addressing in revision. This is the sort of practical refinement that people who actually deploy high-dimensional BO would want to try. It is not a field-shifting result, but the idea is clear enough and the empirical claim is testable enough that it merits a serious referee. I would bring the full version to a reading group to look at the exact subspace construction and the benchmark controls.

Referee Report

2 major / 2 minor

Summary. The paper introduces Adaptive Candidate Thompson Sampling (ACTS) for high-dimensional Bayesian optimization with Gaussian process surrogates. Thompson sampling is made tractable by restricting the posterior over the maximizer to a discrete candidate set; ACTS adaptively constructs this set inside low-dimensional subspaces aligned with the gradient of a single draw from the GP posterior. The method is positioned as an orthogonal, drop-in replacement for existing TS variants (including trust-region approaches) that yields higher-quality maximizer samples and better optimization performance on synthetic and real-world benchmarks.

Significance. If the central claim holds, ACTS would provide a lightweight, assumption-light way to increase candidate density in high dimensions without relying on scalable GP approximations or fixed trust regions. The drop-in compatibility and reported gains on both synthetic and real benchmarks would make it a practical contribution to the BO literature, particularly if the subspace construction can be shown to preserve coverage of the global argmax.

major comments (2)

[Abstract] The abstract states that subspaces are 'guided by the gradient of a surrogate model sample' and that this produces 'better samples of maxima.' However, no argument or bound is supplied showing that a single posterior sample's gradient direction reliably intersects the basin of the global maximizer rather than a local mode. In multimodal high-dimensional landscapes this is load-bearing for the claim that ACTS is unbiased relative to standard TS.
[Method / Experiments] The skeptic note highlights that the subspace radius and selection rule are not shown to guarantee intersection with the global optimum when modes are separated by distances larger than the local curvature scale. If the full manuscript contains no theoretical coverage guarantee or ablation on deliberately multimodal test functions (e.g., with well-separated modes), the empirical improvements cannot be attributed to the method rather than to the particular benchmark suite.

minor comments (2)

[Abstract] The abstract claims ACTS is 'parameter-free' relative to trust-region methods, yet the subspace construction necessarily introduces at least a radius or dimensionality-reduction hyper-parameter; this should be clarified and compared explicitly.
[Method] Notation for the surrogate sample, gradient, and candidate-set construction should be introduced with a short equation or pseudocode block early in the method section to make the drop-in replacement property immediately verifiable.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments and the opportunity to clarify aspects of our work. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [Abstract] The abstract states that subspaces are 'guided by the gradient of a surrogate model sample' and that this produces 'better samples of maxima.' However, no argument or bound is supplied showing that a single posterior sample's gradient direction reliably intersects the basin of the global maximizer rather than a local mode. In multimodal high-dimensional landscapes this is load-bearing for the claim that ACTS is unbiased relative to standard TS.

Authors: We appreciate this observation. The original manuscript does not provide a theoretical bound or argument for reliable intersection with the global maximizer's basin, as the approach is heuristic. We do not claim that ACTS is unbiased relative to standard TS; it is an approximation method that adaptively densifies candidates in subspaces likely to contain high-value regions based on posterior samples. In the revised version, we have modified the abstract to avoid any implication of unbiasedness and added a paragraph in the discussion section acknowledging the limitations in highly multimodal landscapes and the reliance on empirical performance. revision: yes
Referee: [Method / Experiments] The skeptic note highlights that the subspace radius and selection rule are not shown to guarantee intersection with the global optimum when modes are separated by distances larger than the local curvature scale. If the full manuscript contains no theoretical coverage guarantee or ablation on deliberately multimodal test functions (e.g., with well-separated modes), the empirical improvements cannot be attributed to the method rather than to the particular benchmark suite.

Authors: We agree that no theoretical coverage guarantee is provided, and deriving one for general cases remains an open challenge. The manuscript includes experiments on multimodal functions like the 6D Hartmann function, but we acknowledge the lack of specific ablations on functions with well-separated modes. In the revised manuscript, we have included an additional ablation study using a high-dimensional multimodal function with separated modes to demonstrate the method's behavior. We have also expanded the description of the subspace radius and selection rule in Section 3 to clarify their design choices. However, we maintain that the empirical gains on the reported benchmarks are attributable to the adaptive candidate construction, as evidenced by comparisons with standard TS variants. revision: partial

standing simulated objections not resolved

Providing a formal theoretical guarantee that the gradient of a single posterior sample reliably leads to the global maximizer in arbitrary multimodal high-dimensional functions.

Circularity Check

0 steps flagged

No circularity: new adaptive subspace construction for TS is independent of its claimed performance

full rationale

The paper introduces ACTS as an algorithmic modification to Thompson sampling that restricts candidate generation to gradient-guided subspaces of a single posterior draw. No equations, fitted parameters, or predictions are presented that reduce to the method's own inputs by construction. The central claims rest on empirical benchmarks rather than a self-referential derivation or uniqueness theorem. Self-citations, if present in the full text, are not load-bearing for the core construction, which is described as a drop-in replacement orthogonal to prior scalable GP or trust-region approaches. The derivation chain is therefore self-contained and does not exhibit any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review is abstract-only; the ledger is therefore minimal and provisional. Full paper may introduce fitted length-scales, trust-region radii, or gradient-estimation tolerances.

axioms (2)

domain assumption Gaussian-process surrogates provide usable gradient information for guiding subspace selection.
Standard modeling choice in Bayesian optimization; invoked implicitly when the method uses surrogate gradients.
ad hoc to paper Reducing the candidate set to gradient-guided subspaces still yields representative samples of the posterior maximizer.
Core unproven premise of ACTS; not justified in the abstract.

pith-pipeline@v0.9.0 · 5425 in / 1229 out tokens · 75514 ms · 2026-05-10T17:10:23.749256+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 1 canonical work pages

[1]

Unexpected improvements to expected improvement for B ayesian optimization

Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Unexpected improvements to expected improvement for B ayesian optimization. In Advances in Neural Information Processing Systems, volume 36, pages 20577--20612. Curran Associates, Inc., 2023

2023
[2]

Botorch: A framework for efficient M onte- C arlo B ayesian optimization

Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wilson, and Eytan Bakshy. Botorch: A framework for efficient M onte- C arlo B ayesian optimization. In Advances in Neural Information Processing Systems, volume 33, pages 21524--21538. Curran Associates, Inc., 2020

2020
[3]

B ayesian optimization with safety constraints: safe and automatic parameter tuning in robotics

Felix Berkenkamp, Andreas Krause, and Angela P Schoellig. B ayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. Machine Learning, 112 0 (10): 0 3713--3747, 2023

2023
[4]

Sch \"o n, Jan-Willem van Wingerden, and Michel Verhaegen

Hildo Bijl, Thomas B. Sch \"o n, Jan-Willem van Wingerden, and Michel Verhaegen. A sequential M onte C arlo approach to T hompson sampling for B ayesian optimization, 2017

2017
[5]

Segler, and Alain C

Nathan Brown, Marco Fiscato, Marwin H.S. Segler, and Alain C. Vaucher. Guacamol: Benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling, 59 0 (3): 0 1096--1108, Mar 2019. ISSN 1549-9596

2019
[6]

Targeted materials discovery using B ayesian algorithm execution

Sathya R Chitturi, Akash Ramdas, Yue Wu, Brian Rohr, Stefano Ermon, Jennifer Dionne, Felipe H da Jornada, Mike Dunne, Christopher Tassone, Willie Neiswanger, et al. Targeted materials discovery using B ayesian algorithm execution. npj Computational Materials, 10 0 (1): 0 156, 2024

2024
[7]

Multi-objective B ayesian optimization over high-dimensional search spaces

Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Multi-objective B ayesian optimization over high-dimensional search spaces. In Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 507--517. PMLR, 01--05 Aug 2022

2022
[8]

PILCO : A model-based and data-efficient approach to policy search

Marc Deisenroth and Carl E Rasmussen. PILCO : A model-based and data-efficient approach to policy search. In International Conference on Machine Learning, pages 465--472, 2011

2011
[9]

Toward autonomous additive manufacturing: B ayesian optimization on a 3D printer

James R Deneault, Jorge Chang, Jay Myung, Daylond Hooper, Andrew Armstrong, Mark Pitt, and Benji Maruyama. Toward autonomous additive manufacturing: B ayesian optimization on a 3D printer. MRS Bulletin, 46: 0 566--575, 2021

2021
[10]

High-dimensional Bayesian optimization with sparse axis-aligned subspaces

David Eriksson and Martin Jankowiak. High-dimensional Bayesian optimization with sparse axis-aligned subspaces. In Uncertainty in Artificial Intelligence, volume 161 of Proceedings of Machine Learning Research, pages 493--503. PMLR, 27--30 Jul 2021

2021
[11]

Scalable constrained B ayesian optimization

David Eriksson and Matthias Poloczek. Scalable constrained B ayesian optimization. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 730--738. PMLR, 13--15 Apr 2021

2021
[12]

Scalable global optimization via local B ayesian optimization

David Eriksson, Michael Pearce, Jacob Gardner, Ryan D Turner, and Matthias Poloczek. Scalable global optimization via local B ayesian optimization. Advances in Neural Information Processing Systems, 32, 2019

2019
[13]

Gpytorch: Blackbox matrix-matrix G aussian process inference with gpu acceleration

Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. Gpytorch: Blackbox matrix-matrix G aussian process inference with gpu acceleration. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

2018
[14]

B ayesian optimization

Roman Garnett. B ayesian optimization . Cambridge University Press, 2023

2023
[15]

Vanilla B ayesian optimization performs great in high dimensions

Carl Hvarfner, Erik Orm Hellsten, and Luigi Nardi. Vanilla B ayesian optimization performs great in high dimensions. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 20793--20817. PMLR, 21--27 Jul 2024

2024
[16]

Large-scale multi-disciplinary mass optimization in the auto industry

Donald R Jones. Large-scale multi-disciplinary mass optimization in the auto industry. In MOPTA 2008 Conference (20 August 2008), volume 64, 2008

2008
[17]

Efficient global optimization of expensive black-box functions

Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13: 0 455--492, 1998

1998
[18]

Parallelised B ayesian optimisation via T hompson sampling

Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnab \'a s P \'o czos. Parallelised B ayesian optimisation via T hompson sampling. In Artificial Intelligence and Statistics, 2018

2018
[19]

Local latent space B ayesian optimization over structured inputs

Natalie Maus, Haydn Jones, Juston Moore, Matt J Kusner, John Bradshaw, and Jacob Gardner. Local latent space B ayesian optimization over structured inputs. In Advances in Neural Information Processing Systems, volume 35, pages 34505--34518. Curran Associates, Inc., 2022

2022
[20]

On B ayesian methods for seeking the extremum

Jonas Mo c kus. On B ayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference Novosibirsk, pages 400--404, 1975

1975
[21]

Local policy search with B ayesian optimization

Sarah M \"u ller, Alexander von Rohr, and Sebastian Trimpe. Local policy search with B ayesian optimization. Advances in Neural Information Processing Systems, 34: 0 20708--20720, 2021

2021
[22]

Bayesian learning for neural networks, volume 118

Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Scionce & Business Media, 2012

2012
[23]

Local B ayesian optimization via maximizing probability of descent

Quan Nguyen, Kaiwen Wu, Jacob Gardner, and Roman Garnett. Local B ayesian optimization via maximizing probability of descent. Advances in neural information processing systems, 35: 0 13190--13202, 2022

2022
[24]

Increasing the scope as you learn: Adaptive B ayesian optimization in nested subspaces

Leonard Papenmeier, Luigi Nardi, and Matthias Poloczek. Increasing the scope as you learn: Adaptive B ayesian optimization in nested subspaces. In Advances in Neural Information Processing Systems, volume 35, pages 11586--11601. Curran Associates, Inc., 2022

2022
[25]

Exploring exploration in bayesian optimization

Leonard Papenmeier, Nuojin Cheng, Stephen Becker, and Luigi Nardi. Exploring exploration in bayesian optimization. In Proceedings of the Forty-First Conference on Uncertainty in Artificial Intelligence, UAI '25. JMLR.org, 2025

2025
[26]

Constant-time predictive distributions for G aussian processes

Geoff Pleiss, Jacob Gardner, Kilian Weinberger, and Andrew Gordon Wilson. Constant-time predictive distributions for G aussian processes. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4114--4123. PMLR, 10--15 Jul 2018

2018
[27]

Fast matrix square roots with applications to G aussian processes and B ayesian optimization

Geoff Pleiss, Martin Jankowiak, David Eriksson, Anil Damle, and Jacob Gardner. Fast matrix square roots with applications to G aussian processes and B ayesian optimization. Advances in Neural Information Processing Systems, 33, 2020

2020
[28]

Random features for large-scale kernel machines

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007

2007
[29]

Cylindrical T hompson sampling for high-dimensional B ayesian optimization

Bahador Rashidi, Kerrick Johnstonbaugh, and Chao Gao. Cylindrical T hompson sampling for high-dimensional B ayesian optimization. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 3502--3510. PMLR, 02--04 May 2024

2024
[30]

Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006

2006
[31]

Regis and Christine A

Rommel G. Regis and Christine A. Shoemaker. Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Engineering Optimization, 45 0 (5): 0 529--555, 2013

2013
[32]

qpots: Efficient batch multiobjective B ayesian optimization via pareto optimal thompson sampling

Ashwin Renganathan and Kade Carlson. qpots: Efficient batch multiobjective B ayesian optimization via pareto optimal thompson sampling. In Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 of Proceedings of Machine Learning Research, pages 4051--4059. PMLR, 03--05 May 2025

2025
[33]

An information-theoretic analysis of T hompson sampling

Daniel Russo and Benjamin Van Roy. An information-theoretic analysis of T hompson sampling. Journal of Machine Learning Research, 17 0 (68): 0 1--30, 2016

2016
[34]

Learning to optimize via information-directed sampling

Daniel Russo and Benjamin Van Roy. Learning to optimize via information-directed sampling. In Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014

2014
[35]

Significance of gradient information in bayesian optimization

Shubhanshu Shekhar and Tara Javidi. Significance of gradient information in bayesian optimization. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 2836--2844. PMLR, 13--15 Apr 2021

2021
[36]

Automated self-optimization, intensification, and scale-up of photocatalysis in flow

Aidan Slattery, Zhenghui Wen, Pauline Tenblad, Jes \'u s Sanjos \'e -Orduna, Diego Pintossi, Tim den Hartog, and Timothy No \"e l. Automated self-optimization, intensification, and scale-up of photocatalysis in flow. Science, 383 0 (6681): 0 eadj1817, 2024

2024
[37]

Practical B ayesian optimization of machine learning algorithms

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical B ayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25, 2012

2012
[38]

G aussian process optimization in the bandit setting: no regret and experimental design

Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. G aussian process optimization in the bandit setting: no regret and experimental design. In International Conference on Machine Learning, pages 1015--1022, 2010

2010
[39]

Fast, precise T hompson sampling for B ayesian optimization, 2024

David Sweet. Fast, precise T hompson sampling for B ayesian optimization, 2024

2024
[40]

Freeze-thaw bayesian optimization

Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams. Freeze-thaw bayesian optimization. arXiv preprint arXiv:1406.3896, 2014

work page arXiv 2014
[41]

Thompson

William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25 0 (3/4): 0 285--294, 1933. ISSN 00063444

1933
[42]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026--5033, 2012

2012
[43]

Lassobench: A high-dimensional hyperparameter optimization benchmark suite for lasso

Kenan S ehi\'c, Alexandre Gramfort, Joseph Salmon, and Luigi Nardi. Lassobench: A high-dimensional hyperparameter optimization benchmark suite for lasso. In Proceedings of the First International Conference on Automated Machine Learning, volume 188 of Proceedings of Machine Learning Research, pages 2/1--24. PMLR, 25--27 Jul 2022

2022
[44]

High-dimensional statistics: A non-asymptotic viewpoint

Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press, 2019

2019
[45]

Batched large-scale bayesian optimization in high-dimensional spaces

Zi Wang, Clement Gehring, Pushmeet Kohli, and Stefanie Jegelka. Batched large-scale bayesian optimization in high-dimensional spaces. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 745--754. PMLR, 09--11 Apr 2018

2018
[46]

Efficiently sampling functions from G aussian process posteriors

James Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Deisenroth. Efficiently sampling functions from G aussian process posteriors. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10292--10302. PMLR, 13--18 Jul 2020

2020
[47]

Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth

James T. Wilson, Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. Pathwise conditioning of G aussian processes. Journal of Machine Learning Research, 22 0 (105): 0 1--47, 2021

2021
[48]

B ayesian optimization with gradients

Jian Wu, Matthias Poloczek, Andrew G Wilson, and Peter Frazier. B ayesian optimization with gradients. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

2017
[49]

The behavior and convergence of local B ayesian optimization

Kaiwen Wu, Kyurae Kim, Roman Garnett, and Jacob Gardner. The behavior and convergence of local B ayesian optimization. Advances in neural information processing systems, 36: 0 73497--73523, 2023

2023
[50]

Improving sample efficiency of high dimensional B ayesian optimization with MCMC

Zeji Yi, Yunyue Wei, Chu Xin Cheng, Kaibo He, and Yanan Sui. Improving sample efficiency of high dimensional B ayesian optimization with MCMC . In Proceedings of the 6th Annual Learning for Dynamics & Control Conference, volume 242 of Proceedings of Machine Learning Research, pages 813--824. PMLR, 15--17 Jul 2024

2024