arxiv: 1807.02811 · v1 · submitted 2018-07-08 · 📊 stat.ML · cs.LG· math.OC

Recognition: 2 theorem links

· Lean Theorem

A Tutorial on Bayesian Optimization

Peter I. Frazier

Authors on Pith no claims yet

Pith reviewed 2026-05-13 14:12 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC

keywords Bayesian optimizationGaussian process regressionacquisition functionsexpected improvemententropy searchknowledge gradientsurrogate modelsnoisy evaluations

0 comments

The pith

Bayesian optimization builds a Gaussian process surrogate for an expensive objective and uses an acquisition function to choose each next evaluation point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The tutorial presents Bayesian optimization as a method for finding optima of functions that require minutes or hours per evaluation. It models the unknown objective with Gaussian process regression to produce both a mean prediction and a variance at every unevaluated location. From this surrogate it defines an acquisition function that scores candidate points according to the expected value of the information they would provide. The paper supplies a decision-theoretic generalization of expected improvement that remains valid when observations contain noise. It also surveys extensions to parallel evaluations, multi-fidelity sources, constraints, and derivative information.

Core claim

Bayesian optimization constructs a surrogate model of the objective function using Gaussian process regression to capture both the predicted value and the uncertainty at unevaluated points. It then defines an acquisition function from this model, such as expected improvement, entropy search, or knowledge gradient, to choose the next point to evaluate by maximizing the expected utility of the information gained. The tutorial extends this to noisy settings by providing a decision-theoretically justified version of expected improvement that accounts for observation noise.

What carries the argument

Gaussian process regression that supplies a posterior mean and variance, paired with an acquisition function that converts this posterior into a scalar score for choosing the next evaluation location.

If this is right

The method requires far fewer evaluations than exhaustive search when each evaluation is expensive.
The noise-aware expected improvement permits reliable optimization even when function values are corrupted by stochastic noise.
Parallel and multi-fidelity extensions allow the same framework to use batches of evaluations or cheaper proxy models.
Derivative information, when available, can be incorporated directly into the Gaussian process to sharpen the surrogate.
Constraints and multi-task formulations are handled by modifying the acquisition function without changing the core surrogate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate-plus-acquisition structure can be reused for hyperparameter tuning in machine learning, where each trial is costly.
The formal noisy expected improvement may outperform earlier heuristic adjustments on real data sets that contain measurement error.
For problems exceeding roughly twenty dimensions the Gaussian process surrogate becomes computationally expensive, suggesting a natural boundary for the method's direct application.

Load-bearing premise

A Gaussian process supplies an adequate probabilistic model of the unknown objective function.

What would settle it

On a standard benchmark function whose optimum and noise level are known in advance, compare the number of evaluations required by the method against random search or a simple grid; if the method needs as many or more evaluations to reach the known optimum, the claimed efficiency does not hold.

read the original abstract

Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A clear tutorial on Bayesian optimization that adds a formal decision-theoretic version of noisy expected improvement.

read the letter

This tutorial gives a straightforward account of how Bayesian optimization works for expensive black-box problems and includes one technical addition worth noting. The core is a walk-through of Gaussian process regression as the surrogate, followed by the main acquisition functions: expected improvement, entropy search, and knowledge gradient. It then covers practical extensions such as parallel evaluations, multi-fidelity and multi-source optimization, constraints, random environmental conditions, multi-task settings, and derivative information, plus a short section on available software and open questions. The writing stays accessible without skipping the key ideas. The new piece is a generalization of expected improvement to noisy observations, presented as following directly from a decision-theoretic argument rather than the ad hoc adjustments common in earlier work. If the derivation is clean, it supplies a principled default for the noisy case that many practitioners actually face. The main limitation is that the paper is still a tutorial, so its value rests on synthesis and clarity rather than new theorems or large-scale experiments. It inherits the usual modeling assumption that a Gaussian process is an adequate surrogate, which holds for many smooth objectives but can be off when the function has discontinuities or strong non-stationarity. The abstract flags the noisy-EI result as original, but without the full derivation visible here it is hard to judge how much it truly differs from prior formalizations. No circular reasoning or invented quantities appear in the outline. This is the kind of paper that helps students and engineers who need to implement or tune Bayesian optimization in practice. A reading group focused on applied machine learning or experimental design would get use from it. It deserves peer review because the tutorial material is reliable and the added justification for noisy expected improvement, if it holds, fills a small but real gap in the standard toolkit.

Referee Report

1 major / 3 minor

Summary. This tutorial describes Bayesian optimization for expensive-to-evaluate black-box functions over low-dimensional continuous domains. It covers Gaussian process regression as the surrogate, three acquisition functions (expected improvement, entropy search, knowledge gradient), and extensions including parallel evaluations, multi-fidelity optimization, constraints, random environmental conditions, multi-task settings, and derivative information. The central technical contribution is a generalization of expected improvement to noisy observations, derived via a formal decision-theoretic argument rather than ad hoc modifications.

Significance. If the decision-theoretic derivation of noisy expected improvement is correct, the paper supplies a clear, self-contained reference that consolidates established material while adding a principled treatment of noise. The breadth of advanced topics and the explicit contrast with prior ad hoc approaches make it potentially useful for both newcomers and practitioners seeking a unified exposition.

major comments (1)

The decision-theoretic justification for the noisy-EI generalization is presented as the key novelty, yet the manuscript does not include an explicit reduction showing that the new acquisition function recovers the standard noise-free EI when observation noise variance approaches zero. This step is load-bearing for the claim that the generalization is principled rather than ad hoc.

minor comments (3)

Notation for the noisy observation model (e.g., y = f(x) + ε) is introduced without a dedicated equation number; cross-referencing would improve readability in the acquisition-function sections.
Several figures illustrating GP posterior samples and acquisition surfaces lack axis labels or legends indicating the noise level, making it difficult to connect the visuals to the noisy-EI derivation.
The discussion of software packages in the final section would benefit from explicit version numbers or DOIs for the cited libraries to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive assessment. We address the single major comment below.

read point-by-point responses

Referee: The decision-theoretic justification for the noisy-EI generalization is presented as the key novelty, yet the manuscript does not include an explicit reduction showing that the new acquisition function recovers the standard noise-free EI when observation noise variance approaches zero. This step is load-bearing for the claim that the generalization is principled rather than ad hoc.

Authors: We agree that an explicit reduction to the noise-free case would make the principled character of the generalization clearer. In the revised manuscript we will add a short derivation (in the main text or an appendix) showing that the proposed acquisition function recovers standard expected improvement in the limit of vanishing observation noise. The argument follows directly from the decision-theoretic construction once the posterior variance contributed by observation noise is set to zero. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a tutorial explaining Gaussian process regression and standard acquisition functions (expected improvement, entropy search, knowledge gradient) before presenting a generalization of expected improvement to noisy observations. This generalization is derived from a decision-theoretic argument that starts from the definition of the acquisition function and the posterior over the objective; the derivation does not reduce to a fitted parameter renamed as a prediction, a self-referential definition, or a load-bearing self-citation. All steps remain self-contained against external benchmarks (standard GP theory and decision theory) with no equations shown to be equivalent to their inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is a tutorial that relies on standard background assumptions from the Gaussian process and Bayesian optimization literature rather than introducing new free parameters or entities.

axioms (1)

domain assumption Gaussian process regression supplies a suitable surrogate model that quantifies uncertainty for the objective function
Stated as the foundation of the method in the abstract.

pith-pipeline@v0.9.0 · 5494 in / 1162 out tokens · 70015 ms · 2026-05-13T14:12:39.323428+00:00 · methodology

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Elicitation-Augmented Bayesian Optimization
cs.LG 2026-05 unverdicted novelty 7.0

A cost-aware value-of-information acquisition function is derived to balance direct observations against noisy pairwise human comparisons in Bayesian optimization, approaching the convex hull of the individual informa...
Bayesian Optimization with Structured Measurements: A Vector-Valued RKHS Framework
cs.LG 2026-05 unverdicted novelty 7.0

Proposes a vector-valued RKHS framework for Bayesian optimization with structured measurements, deriving concentration bounds and UCB-based regret guarantees that recover sublinear rates.
Learning myopic mixed-integer nonlinear model predictive control from expert demonstrations
eess.SY 2026-05 unverdicted novelty 7.0

A myopic MINMPC framework learns a value function offline via inverse optimization from expert data, allowing short horizons with near-optimal performance and strict integer feasibility online for hybrid systems.
Categorical Optimization with Bayesian Anchored Latent Trust Regions for Structural Design under High-Dimensional Uncertainty
cs.LG 2026-04 unverdicted novelty 7.0

COBALT performs direct discrete optimization over high-dimensional categorical structural designs by anchoring latent embeddings as graphs and applying trust-region acquisition on additive Gaussian process surrogates ...
An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions
math.OC 2026-04 conditional novelty 7.0

PALM-Mean combines sign-aware piecewise-linear relaxations of locally important kernel terms with closed-form analytic bounds on the rest inside a reduced-space branch-and-bound framework, yielding valid lower bounds ...
Collaborative Contextual Bayesian Optimization
cs.LG 2026-04 unverdicted novelty 7.0

CCBO enables collaborative contextual Bayesian optimization across clients with sublinear regret guarantees and shows substantial gains over non-collaborative methods in simulations and a hot rolling application even ...
Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification
stat.ML 2026-05 unverdicted novelty 6.0

SSLA approximates the posterior predictive distribution by refitting Bayesian models on self-predicted data, providing a sampling-free method that improves predictive calibration over classical Laplace approximations ...
Ensemble Distributionally Robust Bayesian Optimisation
cs.LG 2026-05 unverdicted novelty 6.0

A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.
Bayesian Algorithm for Collaborative Optimization with Application to Aircraft Design
math.OC 2026-05 conditional novelty 6.0

BACO replaces direct black-box calls in collaborative optimization with Gaussian process surrogates at both subsystem and system levels, achieving lower objectives and near-zero constraint violations on MDO benchmarks...
Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications
cs.LG 2026-05 unverdicted novelty 6.0

A novel permutation-invariant GP kernel using set divergence is introduced for Bayesian optimization in CCS well placement and tested on synthetic benchmarks plus one real formation case.
HASOD: A Hybrid Adaptive Screening-Optimization Design for High-Dimensional Industrial Experiments
stat.ME 2026-04 unverdicted novelty 6.0

HASOD is a hybrid adaptive framework that unifies factor screening via a new CWESS statistic and response optimization using Gaussian processes, achieving 97% detection accuracy in simulations with asymptotic consiste...
On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems
cs.LG 2026-05 unverdicted novelty 5.0

Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustn...
ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization
cs.LG 2026-05 unverdicted novelty 5.0

OrthoBO introduces an orthogonal acquisition estimator subtracting an optimally weighted score-function control variate to reduce Monte Carlo variance, preserve the acquisition target, and improve ranking stability in...
Harnessing a 256-qubit Neutral Atom Simulator for Graph Classification
quant-ph 2026-05 unverdicted novelty 5.0

A 256-qubit neutral atom simulator computes Quantum Evolution Kernels for graph classification on the PROTEINS dataset, achieving slightly better performance than classical kernels.
Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning
cs.DC 2026-05 unverdicted novelty 5.0

Bayesian optimization with dimensionality reduction improves Hyperledger Fabric throughput by up to 12% in a 317-dimensional configuration space via an automated Caliper benchmarking loop.
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
cs.LG 2026-04 unverdicted novelty 5.0

AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-...
Physics-informed automated surface reconstructing via low-energy electron diffraction based on Bayesian optimization
physics.comp-ph 2026-04 unverdicted novelty 5.0

A trust-region Bayesian optimization framework integrates LEED multiple scattering models to jointly optimize structural and experimental parameters for automated surface reconstruction.
Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation
cs.LG 2026-05 unverdicted novelty 4.0

History-conditioned RL policies recover nearly all privileged-state performance with deployable well data, and latent model-based retuning outperforms direct model-free retuning under abnormal reservoir conditions.
Enhancing Model Based Derivative Free Optimization using Direct Search
math.OC 2026-04 unverdicted novelty 4.0

A hybrid switching approach integrates Direct Search into model-based derivative-free optimization, with a convergence proof for single-objective cases and empirical gains on ML tasks and CUTEr benchmarks.
BayMOTH: Bayesian optiMizatiOn with meTa-lookahead -- a simple approacH
cs.LG 2026-04 unverdicted novelty 4.0

BayMOTH unifies meta-Bayesian optimization with a usefulness-based fallback to lookahead, demonstrating competitive results on function optimization tasks even under low task relatedness.
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
cs.LG 2024-03 accept novelty 4.0

A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
cs.LG 2026-04 accept novelty 2.0

Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.

Reference graph

Works this paper leans on

101 extracted references · 101 canonical work pages · cited by 22 Pith papers

[1]

O., Shahriari, B., and Schmidt, M

Ahmed, M. O., Shahriari, B., and Schmidt, M. (2016). Do we need ``harmless'' B ayesian optimization and ``first-order'' B ayesian optimization. In Neural Information Processing Systems 2016 Workshop on Bayesian Optimization

work page 2016
[2]

Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis . Springer Science & Business Media

work page 2013
[3]

Blum, J. R. (1954). Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics , pages 737--744

work page 1954
[4]

Booker, A., Dennis, J., Frank, P., Serafini, D., Torczon, V., and Trosset, M. (1999). A rigorous framework for optimization of expensive functions by surrogates . Structural and Multidisciplinary Optimization , 17(1):1--13

work page 1999
[5]

Bottou, L. (2012). Stochastic gradient descent tricks. In Montavon, G., Orr, G. B., and M \"u ller, K. R., editors, Neural Networks: Tricks of the Trade , pages 421--436. Springer

work page 2012
[6]

Brochu, E., Cora, M., and de Freitas, N. (2009). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, Department of Computer Science, University of British Columbia. arXiv:1012.2599

work page Pith review arXiv 2009
[7]

Bull, A. D. (2011). Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research , 12(Oct):2879--2904

work page 2011
[8]

Calvin, J. (1997). Average performance of a class of adaptive algorithms for global optimization . The Annals of Applied Probability , 7(3):711--730

work page 1997
[9]

and Z ilinskas, A

Calvin, J. and Z ilinskas, A. (2005). One-dimensional global optimization for observations with noise . Computers & Mathematics with Applications , 50(1-2):157--169

work page 2005
[10]

and Z ilinskas, A

Calvin, J. and Z ilinskas, A. (1999). On the convergence of the P-algorithm for one-dimensional global optimization of smooth functions . Journal of Optimization Theory and Applications , 102(3):479--495

work page 1999
[11]

and Z ilinskas, A

Calvin, J. and Z ilinskas, A. (2000). One-dimensional P-algorithm with convergence rate O(n-3+ ) for smooth functions . Journal of Optimization Theory and Applications , 106(2):297--307

work page 2000
[12]

M., Kumarga, L., and Frazier, P

Cashore, J. M., Kumarga, L., and Frazier, P. I. (2016). Multi-step B ayesian optimization for one-dimensional feasibility determination. arXiv preprint arXiv:1607.03195

work page arXiv 2016
[13]

B., Williams, B

Chang, P. B., Williams, B. J., Bhalla, K. S. B., Belknap, T. W., Santner, T. J., Notz, W. I., and Bartel, D. L. (2001). Design and analysis of robust total joint replacements: finite element model experiments with environmental variables. Journal of Biomechanical Engineering , 123(3):239--246

work page 2001
[14]

Chick, S. E. and Inoue, K. (2001). New two-stage and sequential procedures for selecting the best simulated system. Operations Research , 49(5):732--743

work page 2001
[15]

Clark, C. E. (1961). The greatest of a finite set of random variables. Operations Research , 9(2):145--162

work page 1961
[16]

Cover, T. M. and Thomas, J. A. (2012). Elements of Information Theory . John Wiley & Sons

work page 2012
[17]

and Yushkevich, A

Dynkin, E. and Yushkevich, A. (1979). Controlled Markov Processes . Springer, New York

work page 1979
[18]

Forrester, A., S \'o bester, A., and Keane, A. (2008). Engineering Design via Surrogate Modelling: A Practical Guide . Wiley, West Sussex, UK

work page 2008
[19]

I., S \'o bester, A., and Keane, A

Forrester, A. I., S \'o bester, A., and Keane, A. J. (2007). Multi-fidelity optimization via surrogate modelling. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , volume 463, pages 3251--3269. The Royal Society

work page 2007
[20]

Frazier, P., Powell, W., and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS Journal on Computing , 21(4):599--613

work page 2009
[21]

Frazier, P. I. (2012). Tutorial: Optimization via simulation with bayesian statistics and dynamic programming. In Laroque, C., Himmelspach, J., Pasupathy, R., Rose, O., and Uhrmacher, A. M., editors, Proceedings of the 2012 Winter Simulation Conference Proceedings , pages 79--94, Piscataway, New Jersey. Institute of Electrical and Electronics Engineers, Inc

work page 2012
[22]

I., Powell, W

Frazier, P. I., Powell, W. B., and Dayanik, S. (2008). A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization , 47(5):2410--2439

work page 2008
[23]

Frazier, P. I. and Wang, J. (2016). Bayesian optimization for materials design. In Lookman, T., Alexander, F. J., and Rajan, K., editors, Information Science for Materials Discovery and Design , pages 45--75. Springer

work page 2016
[24]

R., Kusner, M

Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). Bayesian optimization with inequality constraints. In ICML , pages 937--945

work page 2014
[25]

B., Stern, H

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis , volume 2. CRC Press Boca Raton, FL

work page 2014
[26]

Ginsbourger, D., Le Riche, R., and Carraro, L. (2007). A multi-points criterion for deterministic parallel global optimization based on kriging . In International Conference on Nonconvex Programming, NCP07 , Rouen, France

work page 2007
[27]

Ginsbourger, D., Le Riche, R., and Carraro, L. (2010). Kriging is well-suited to parallelize optimization . In Tenne, Y. and Goh, C. K., editors, Computational Intelligence in Expensive Optimization Problems , volume 2, pages 131--162. Springer

work page 2010
[28]

and Riche, R

Ginsbourger, D. and Riche, R. (2010). Towards G aussian process-based optimization with finite time horizon. In Giovagnoli, A., Atkinson, A., Torsney, B., and May, C., editors, mODa 9--Advances in Model-Oriented Design and Analysis , pages 89--96. Springer

work page 2010
[29]

Gonz \'a lez, J., Osborne, M., and Lawrence, N. (2016). GLASSES : Relieving the myopia of bayesian optimisation. In Artificial Intelligence and Statistics , pages 790--799

work page 2016
[30]

Groot, P., Birlutiu, A., and Heskes, T. (2010). Bayesian monte carlo for the global optimization of expensive functions. In ECAI , pages 249--254

work page 2010
[31]

and Schuler, C

Hennig, P. and Schuler, C. J. (2012). Entropy search for information-efficient global optimization. Journal of Machine Learning Research , 13:1809--1837

work page 2012
[32]

M., Gelbart, M

Hern \'a ndez-Lobato, J. M., Gelbart, M. A., Hoffman, M. W., Adams, R. P., and Ghahramani, Z. (2015). Predictive entropy search for bayesian optimization with unknown constraints. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37 , pages 1699--1707. JMLR. org

work page 2015
[33]

M., Hoffman, M

Hern \'a ndez-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. (2014). Predictive entropy search for efficient global optimization of black-box functions. In Advances in neural information processing systems , pages 918--926

work page 2014
[34]

Ho, Y.-C., Cao, X., and Cassandras, C. (1983). Infinitesimal and finite perturbation analysis for queueing networks. Automatica , 19(4):439--445

work page 1983
[35]

Huang, D., Allen, T., Notz, W., and Miller, R. (2006). Sequential kriging optimization using multiple-fidelity evaluations . Structural and Multidisciplinary Optimization , 32(5):369--382

work page 2006
[36]

I., and Sznitman, R

Jedynak, B., Frazier, P. I., and Sznitman, R. (2012). Twenty questions with noise: B ayes optimal policies for entropy loss. Journal of Applied Probability , 49(1):114--136

work page 2012
[37]

R., Schonlau, M., and Welch, W

Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization , 13(4):455--492

work page 1998
[38]

Ju, S., Shiga, T., Feng, L., Hou, Z., Tsuda, K., and Shiomi, J. (2017). Designing nanostructures for phonon transport via B ayesian optimization. Physical Review X , 7

work page 2017
[39]

P., Littman, M

Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research , 4:237--285

work page 1996
[40]

B., Schneider, J., and P \'o czos, B

Kandasamy, K., Dasarathy, G., Oliva, J. B., Schneider, J., and P \'o czos, B. (2016). Gaussian process bandit optimisation with multi-fidelity evaluations. In Advances in Neural Information Processing Systems , pages 992--1000

work page 2016
[41]

Kandasamy, K., Schneider, J., and P \'o czos, B. (2015). High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning , pages 295--304

work page 2015
[42]

Keane, A. (2006). Statistical improvement criteria for use in multiobjective design optimization . AIAA Journal , 44(4):879--891

work page 2006
[43]

Kersting, K., Plagemann, C., Pfaff, P., and Burgard, W. (2007). Most likely heteroscedastic gaussian process regression. In Proceedings of the 24th International Conference on Machine learning , pages 393--400. ACM

work page 2007
[44]

Kleijnen, J. P. et al. (2008). Design and Analysis of Simulation Experiments , volume 20. Springer

work page 2008
[45]

Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. (2016). Fast B ayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079

work page arXiv 2016
[46]

Knowles, J. (2006). ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems . IEEE Transactions on Evolutionary Computation , 10(1):50--66

work page 2006
[47]

Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering , 86(1):97--106

work page 1964
[48]

L., and Willcox, K

Lam, R., Allaire, D. L., and Willcox, K. E. (2015). Multifidelity optimization using statistical surrogate modeling for non-hierarchical information sources. In 56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference , page 0143

work page 2015
[49]

Lam, R., Willcox, K., and Wolpert, D. H. (2016). Bayesian optimization with a finite budget: An approximate dynamic programming approach. In Advances in Neural Information Processing Systems , pages 883--891

work page 2016
[50]

Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming , 45(1-3):503--528

work page 1989
[51]

Lizotte, D. (2008). Practical Bayesian Optimization . PhD thesis, University of Alberta

work page 2008
[52]

Lizotte, D., Wang, T., Bowling, M., and Schuurmans, D. (2007). Automatic gait optimization with G aussian process regression . In Proceedings of IJCAI , pages 944--949

work page 2007
[53]

and Teneketzis, D

Mahajan, A. and Teneketzis, D. (2008). Multi-armed bandit problems. In Hero, A., Casta\ n \' o n, D., Cochran, D., and Kastella, K., editors, Foundations and Applications of Sensor Management , pages 121--151. Springer

work page 2008
[54]

A., Mendiburu, A., and Hernando, L

Mart \' , R., Lozano, J. A., Mendiburu, A., and Hernando, L. (2016). Multi-start methods. Handbook of Heuristics , pages 1--21

work page 2016
[55]

A., and Roberts, S

McLeod, M., Osborne, M. A., and Roberts, S. J. (2017). Practical bayesian optimization for variable cost objectives. arXiv preprint arXiv:1703.04335

work page arXiv 2017
[56]

and Kleijnen, J

Mehdad, E. and Kleijnen, J. P. (2018). Efficient global optimisation for black-box simulation via sequential intrinsic kriging. Journal of the Operational Research Society , 69:1--13

work page 2018
[57]

and Segal, I

Milgrom, P. and Segal, I. (2002). Envelope theorems for arbitrary choice sets. Econometrica , 70(2):583--601

work page 2002
[58]

Minka, T. P. (2001). A family of algorithms for approximate B ayesian inference . PhD thesis, Massachusetts Institute of Technology

work page 2001
[59]

Mo c kus, J. (1975). On B ayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference , pages 400--404. Springer

work page 1975
[60]

Mo c kus, J. (1989). Bayesian Approach to Global Optimization: Theory and Applications . Kluwer Academic Publishers

work page 1989
[61]

and Mo c kus, L

Mo c kus, J. and Mo c kus, L. (1991). Bayesian approach to global optimization and application to multiobjective and constrained problems . Journal of Optimization Theory and Applications , 70(1):157--172

work page 1991
[62]

Mo c kus, J., Tiesis, V., and Z ilinskas, A. (1978). The application of B ayesian methods for seeking the extremum . In Dixon, L. and Szego, G., editors, Towards Global Optimisation , volume 2, pages 117--129. Elsevier Science Ltd., North Holland, Amsterdam

work page 1978
[63]

Neal, R. M. (2003). Slice sampling. Annals of Statistics , 31(3):705--741

work page 2003
[64]

M., Frazier, P

Negoescu, D. M., Frazier, P. I., and Powell, W. B. (2011). The knowledge gradient algorithm for sequencing experiments in drug discovery. INFORMS Journal on Computing , 23(1):46--363

work page 2011
[65]

A., Garnett, R., and Roberts, S

Osborne, M. A., Garnett, R., and Roberts, S. J. (2009). Gaussian processes for global optimization. In 3rd International Conference on Learning and Intelligent Optimization (LION3) , pages 1--15. Citeseer

work page 2009
[66]

Packwood, D. (2017). Bayesian Optimization for Materials Science , volume 3. Springer

work page 2017
[67]

Perez, S. (2015). Twitter acquires machine learning startup whetlab. TechCrunch . Accessed July 3, 2018

work page 2015
[68]

Poloczek, M., Wang, J., and Frazier, P. (2017). Multi-information source optimization. In Advances in Neural Information Processing Systems , pages 4291--4301

work page 2017
[69]

Powell, W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality . John Wiley & Sons, New York

work page 2007
[70]

and Williams, C

Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning . MIT Press, Cambridge, MA

work page 2006
[71]

and Shoemaker, C

Regis, R. and Shoemaker, C. (2005). Constrained global optimization of expensive black box functions using radial basis functions. Journal of Global Optimization , 31(1):153--171

work page 2005
[72]

and Shoemaker, C

Regis, R. and Shoemaker, C. (2007a). Improved strategies for radial basis function methods for global optimization. Journal of Global Optimization , 37(1):113--135

work page
[73]

and Shoemaker, C

Regis, R. and Shoemaker, C. (2007b). Parallel radial basis function methods for the global optimization of expensive functions. European Journal of Operational Research , 182(2):514--535

work page
[74]

and Monro, S

Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics , 22(3):400--407

work page 1951
[75]

Roustant, O., Ginsbourger, D., and Deville, Y. (2012). Dicekriging, diceoptim: Two r packages for the analysis of computer experiments by kriging-based metamodeling and optimization. Journal of Statistical Software, Articles , 51(1):1--55

work page 2012
[76]

L., and Staum, J

Salemi, P., Nelson, B. L., and Staum, J. (2014). Discrete optimization via simulation using G aussian M arkov random fields. In Proceedings of the 2014 Winter Simulation Conference , pages 3809--3820. IEEE Press

work page 2014
[77]

Sasena, M. (2002). Flexibility and Efficiency Enhancements for Constrained Global Design Optimization with Kriging Approximations . PhD thesis, University of Michigan

work page 2002
[78]

J., and Jones, D

Schonlau, M., Welch, W. J., and Jones, D. R. (1998). Global versus local search in constrained optimization of computer models. Lecture Notes --- Monograph Series , 34:11--25

work page 1998
[79]

I., and Powell, W

Scott, W., Frazier, P. I., and Powell, W. B. (2011). The correlated knowledge gradient for simulation optimization of continuous parameters using G aussian process regression. SIAM Journal on Optimization , 21(3):996--1026

work page 2011
[80]

Seko, A., Togo, A., Hayashi, H., Tsuda, K., Chaput, L., and Tanaka, I. (2015). Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and B ayesian optimization. Physical Review Letters , 115

work page 2015

Showing first 80 references.