pith. machine review for the scientific record. sign in

arxiv: 1807.02811 · v1 · submitted 2018-07-08 · 📊 stat.ML · cs.LG· math.OC

Recognition: 2 theorem links

· Lean Theorem

A Tutorial on Bayesian Optimization

Authors on Pith no claims yet

Pith reviewed 2026-05-13 14:12 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.OC
keywords Bayesian optimizationGaussian process regressionacquisition functionsexpected improvemententropy searchknowledge gradientsurrogate modelsnoisy evaluations
0
0 comments X

The pith

Bayesian optimization builds a Gaussian process surrogate for an expensive objective and uses an acquisition function to choose each next evaluation point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The tutorial presents Bayesian optimization as a method for finding optima of functions that require minutes or hours per evaluation. It models the unknown objective with Gaussian process regression to produce both a mean prediction and a variance at every unevaluated location. From this surrogate it defines an acquisition function that scores candidate points according to the expected value of the information they would provide. The paper supplies a decision-theoretic generalization of expected improvement that remains valid when observations contain noise. It also surveys extensions to parallel evaluations, multi-fidelity sources, constraints, and derivative information.

Core claim

Bayesian optimization constructs a surrogate model of the objective function using Gaussian process regression to capture both the predicted value and the uncertainty at unevaluated points. It then defines an acquisition function from this model, such as expected improvement, entropy search, or knowledge gradient, to choose the next point to evaluate by maximizing the expected utility of the information gained. The tutorial extends this to noisy settings by providing a decision-theoretically justified version of expected improvement that accounts for observation noise.

What carries the argument

Gaussian process regression that supplies a posterior mean and variance, paired with an acquisition function that converts this posterior into a scalar score for choosing the next evaluation location.

If this is right

  • The method requires far fewer evaluations than exhaustive search when each evaluation is expensive.
  • The noise-aware expected improvement permits reliable optimization even when function values are corrupted by stochastic noise.
  • Parallel and multi-fidelity extensions allow the same framework to use batches of evaluations or cheaper proxy models.
  • Derivative information, when available, can be incorporated directly into the Gaussian process to sharpen the surrogate.
  • Constraints and multi-task formulations are handled by modifying the acquisition function without changing the core surrogate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same surrogate-plus-acquisition structure can be reused for hyperparameter tuning in machine learning, where each trial is costly.
  • The formal noisy expected improvement may outperform earlier heuristic adjustments on real data sets that contain measurement error.
  • For problems exceeding roughly twenty dimensions the Gaussian process surrogate becomes computationally expensive, suggesting a natural boundary for the method's direct application.

Load-bearing premise

A Gaussian process supplies an adequate probabilistic model of the unknown objective function.

What would settle it

On a standard benchmark function whose optimum and noise level are known in advance, compare the number of evaluations required by the method against random search or a simple grid; if the method needs as many or more evaluations to reach the known optimum, the claimed efficiency does not hold.

read the original abstract

Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. This tutorial describes Bayesian optimization for expensive-to-evaluate black-box functions over low-dimensional continuous domains. It covers Gaussian process regression as the surrogate, three acquisition functions (expected improvement, entropy search, knowledge gradient), and extensions including parallel evaluations, multi-fidelity optimization, constraints, random environmental conditions, multi-task settings, and derivative information. The central technical contribution is a generalization of expected improvement to noisy observations, derived via a formal decision-theoretic argument rather than ad hoc modifications.

Significance. If the decision-theoretic derivation of noisy expected improvement is correct, the paper supplies a clear, self-contained reference that consolidates established material while adding a principled treatment of noise. The breadth of advanced topics and the explicit contrast with prior ad hoc approaches make it potentially useful for both newcomers and practitioners seeking a unified exposition.

major comments (1)
  1. The decision-theoretic justification for the noisy-EI generalization is presented as the key novelty, yet the manuscript does not include an explicit reduction showing that the new acquisition function recovers the standard noise-free EI when observation noise variance approaches zero. This step is load-bearing for the claim that the generalization is principled rather than ad hoc.
minor comments (3)
  1. Notation for the noisy observation model (e.g., y = f(x) + ε) is introduced without a dedicated equation number; cross-referencing would improve readability in the acquisition-function sections.
  2. Several figures illustrating GP posterior samples and acquisition surfaces lack axis labels or legends indicating the noise level, making it difficult to connect the visuals to the noisy-EI derivation.
  3. The discussion of software packages in the final section would benefit from explicit version numbers or DOIs for the cited libraries to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the positive assessment. We address the single major comment below.

read point-by-point responses
  1. Referee: The decision-theoretic justification for the noisy-EI generalization is presented as the key novelty, yet the manuscript does not include an explicit reduction showing that the new acquisition function recovers the standard noise-free EI when observation noise variance approaches zero. This step is load-bearing for the claim that the generalization is principled rather than ad hoc.

    Authors: We agree that an explicit reduction to the noise-free case would make the principled character of the generalization clearer. In the revised manuscript we will add a short derivation (in the main text or an appendix) showing that the proposed acquisition function recovers standard expected improvement in the limit of vanishing observation noise. The argument follows directly from the decision-theoretic construction once the posterior variance contributed by observation noise is set to zero. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a tutorial explaining Gaussian process regression and standard acquisition functions (expected improvement, entropy search, knowledge gradient) before presenting a generalization of expected improvement to noisy observations. This generalization is derived from a decision-theoretic argument that starts from the definition of the acquisition function and the posterior over the objective; the derivation does not reduce to a fitted parameter renamed as a prediction, a self-referential definition, or a load-bearing self-citation. All steps remain self-contained against external benchmarks (standard GP theory and decision theory) with no equations shown to be equivalent to their inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is a tutorial that relies on standard background assumptions from the Gaussian process and Bayesian optimization literature rather than introducing new free parameters or entities.

axioms (1)
  • domain assumption Gaussian process regression supplies a suitable surrogate model that quantifies uncertainty for the objective function
    Stated as the foundation of the method in the abstract.

pith-pipeline@v0.9.0 · 5494 in / 1162 out tokens · 70015 ms · 2026-05-13T14:12:39.323428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Elicitation-Augmented Bayesian Optimization

    cs.LG 2026-05 unverdicted novelty 7.0

    A cost-aware value-of-information acquisition function is derived to balance direct observations against noisy pairwise human comparisons in Bayesian optimization, approaching the convex hull of the individual informa...

  2. Bayesian Optimization with Structured Measurements: A Vector-Valued RKHS Framework

    cs.LG 2026-05 unverdicted novelty 7.0

    Proposes a vector-valued RKHS framework for Bayesian optimization with structured measurements, deriving concentration bounds and UCB-based regret guarantees that recover sublinear rates.

  3. Learning myopic mixed-integer nonlinear model predictive control from expert demonstrations

    eess.SY 2026-05 unverdicted novelty 7.0

    A myopic MINMPC framework learns a value function offline via inverse optimization from expert data, allowing short horizons with near-optimal performance and strict integer feasibility online for hybrid systems.

  4. Categorical Optimization with Bayesian Anchored Latent Trust Regions for Structural Design under High-Dimensional Uncertainty

    cs.LG 2026-04 unverdicted novelty 7.0

    COBALT performs direct discrete optimization over high-dimensional categorical structural designs by anchoring latent embeddings as graphs and applying trust-region acquisition on additive Gaussian process surrogates ...

  5. An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions

    math.OC 2026-04 conditional novelty 7.0

    PALM-Mean combines sign-aware piecewise-linear relaxations of locally important kernel terms with closed-form analytic bounds on the rest inside a reduced-space branch-and-bound framework, yielding valid lower bounds ...

  6. Collaborative Contextual Bayesian Optimization

    cs.LG 2026-04 unverdicted novelty 7.0

    CCBO enables collaborative contextual Bayesian optimization across clients with sublinear regret guarantees and shows substantial gains over non-collaborative methods in simulations and a hot rolling application even ...

  7. Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

    stat.ML 2026-05 unverdicted novelty 6.0

    SSLA approximates the posterior predictive distribution by refitting Bayesian models on self-predicted data, providing a sampling-free method that improves predictive calibration over classical Laplace approximations ...

  8. Ensemble Distributionally Robust Bayesian Optimisation

    cs.LG 2026-05 unverdicted novelty 6.0

    A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.

  9. Bayesian Algorithm for Collaborative Optimization with Application to Aircraft Design

    math.OC 2026-05 conditional novelty 6.0

    BACO replaces direct black-box calls in collaborative optimization with Gaussian process surrogates at both subsystem and system levels, achieving lower objectives and near-zero constraint violations on MDO benchmarks...

  10. Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications

    cs.LG 2026-05 unverdicted novelty 6.0

    A novel permutation-invariant GP kernel using set divergence is introduced for Bayesian optimization in CCS well placement and tested on synthetic benchmarks plus one real formation case.

  11. HASOD: A Hybrid Adaptive Screening-Optimization Design for High-Dimensional Industrial Experiments

    stat.ME 2026-04 unverdicted novelty 6.0

    HASOD is a hybrid adaptive framework that unifies factor screening via a new CWESS statistic and response optimization using Gaussian processes, achieving 97% detection accuracy in simulations with asymptotic consiste...

  12. On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems

    cs.LG 2026-05 unverdicted novelty 5.0

    Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustn...

  13. ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization

    cs.LG 2026-05 unverdicted novelty 5.0

    OrthoBO introduces an orthogonal acquisition estimator subtracting an optimally weighted score-function control variate to reduce Monte Carlo variance, preserve the acquisition target, and improve ranking stability in...

  14. Harnessing a 256-qubit Neutral Atom Simulator for Graph Classification

    quant-ph 2026-05 unverdicted novelty 5.0

    A 256-qubit neutral atom simulator computes Quantum Evolution Kernels for graph classification on the PROTEINS dataset, achieving slightly better performance than classical kernels.

  15. Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning

    cs.DC 2026-05 unverdicted novelty 5.0

    Bayesian optimization with dimensionality reduction improves Hyperledger Fabric throughput by up to 12% in a 317-dimensional configuration space via an automated Caliper benchmarking loop.

  16. AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

    cs.LG 2026-04 unverdicted novelty 5.0

    AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-...

  17. Physics-informed automated surface reconstructing via low-energy electron diffraction based on Bayesian optimization

    physics.comp-ph 2026-04 unverdicted novelty 5.0

    A trust-region Bayesian optimization framework integrates LEED multiple scattering models to jointly optimize structural and experimental parameters for automated surface reconstruction.

  18. Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation

    cs.LG 2026-05 unverdicted novelty 4.0

    History-conditioned RL policies recover nearly all privileged-state performance with deployable well data, and latent model-based retuning outperforms direct model-free retuning under abnormal reservoir conditions.

  19. Enhancing Model Based Derivative Free Optimization using Direct Search

    math.OC 2026-04 unverdicted novelty 4.0

    A hybrid switching approach integrates Direct Search into model-based derivative-free optimization, with a convergence proof for single-objective cases and empirical gains on ML tasks and CUTEr benchmarks.

  20. BayMOTH: Bayesian optiMizatiOn with meTa-lookahead -- a simple approacH

    cs.LG 2026-04 unverdicted novelty 4.0

    BayMOTH unifies meta-Bayesian optimization with a usefulness-based fallback to lookahead, demonstrating competitive results on function optimization tasks even under low task relatedness.

  21. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    cs.LG 2024-03 accept novelty 4.0

    A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.

  22. Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial

    cs.LG 2026-04 accept novelty 2.0

    Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.

Reference graph

Works this paper leans on

101 extracted references · 101 canonical work pages · cited by 22 Pith papers

  1. [1]

    O., Shahriari, B., and Schmidt, M

    Ahmed, M. O., Shahriari, B., and Schmidt, M. (2016). Do we need ``harmless'' B ayesian optimization and ``first-order'' B ayesian optimization. In Neural Information Processing Systems 2016 Workshop on Bayesian Optimization

  2. [2]

    Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis . Springer Science & Business Media

  3. [3]

    Blum, J. R. (1954). Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics , pages 737--744

  4. [4]

    Booker, A., Dennis, J., Frank, P., Serafini, D., Torczon, V., and Trosset, M. (1999). A rigorous framework for optimization of expensive functions by surrogates . Structural and Multidisciplinary Optimization , 17(1):1--13

  5. [5]

    Bottou, L. (2012). Stochastic gradient descent tricks. In Montavon, G., Orr, G. B., and M \"u ller, K. R., editors, Neural Networks: Tricks of the Trade , pages 421--436. Springer

  6. [6]

    Brochu, E., Cora, M., and de Freitas, N. (2009). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, Department of Computer Science, University of British Columbia. arXiv:1012.2599

  7. [7]

    Bull, A. D. (2011). Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research , 12(Oct):2879--2904

  8. [8]

    Calvin, J. (1997). Average performance of a class of adaptive algorithms for global optimization . The Annals of Applied Probability , 7(3):711--730

  9. [9]

    and Z ilinskas, A

    Calvin, J. and Z ilinskas, A. (2005). One-dimensional global optimization for observations with noise . Computers & Mathematics with Applications , 50(1-2):157--169

  10. [10]

    and Z ilinskas, A

    Calvin, J. and Z ilinskas, A. (1999). On the convergence of the P-algorithm for one-dimensional global optimization of smooth functions . Journal of Optimization Theory and Applications , 102(3):479--495

  11. [11]

    and Z ilinskas, A

    Calvin, J. and Z ilinskas, A. (2000). One-dimensional P-algorithm with convergence rate O(n-3+ ) for smooth functions . Journal of Optimization Theory and Applications , 106(2):297--307

  12. [12]

    M., Kumarga, L., and Frazier, P

    Cashore, J. M., Kumarga, L., and Frazier, P. I. (2016). Multi-step B ayesian optimization for one-dimensional feasibility determination. arXiv preprint arXiv:1607.03195

  13. [13]

    B., Williams, B

    Chang, P. B., Williams, B. J., Bhalla, K. S. B., Belknap, T. W., Santner, T. J., Notz, W. I., and Bartel, D. L. (2001). Design and analysis of robust total joint replacements: finite element model experiments with environmental variables. Journal of Biomechanical Engineering , 123(3):239--246

  14. [14]

    Chick, S. E. and Inoue, K. (2001). New two-stage and sequential procedures for selecting the best simulated system. Operations Research , 49(5):732--743

  15. [15]

    Clark, C. E. (1961). The greatest of a finite set of random variables. Operations Research , 9(2):145--162

  16. [16]

    Cover, T. M. and Thomas, J. A. (2012). Elements of Information Theory . John Wiley & Sons

  17. [17]

    and Yushkevich, A

    Dynkin, E. and Yushkevich, A. (1979). Controlled Markov Processes . Springer, New York

  18. [18]

    Forrester, A., S \'o bester, A., and Keane, A. (2008). Engineering Design via Surrogate Modelling: A Practical Guide . Wiley, West Sussex, UK

  19. [19]

    I., S \'o bester, A., and Keane, A

    Forrester, A. I., S \'o bester, A., and Keane, A. J. (2007). Multi-fidelity optimization via surrogate modelling. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , volume 463, pages 3251--3269. The Royal Society

  20. [20]

    Frazier, P., Powell, W., and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS Journal on Computing , 21(4):599--613

  21. [21]

    Frazier, P. I. (2012). Tutorial: Optimization via simulation with bayesian statistics and dynamic programming. In Laroque, C., Himmelspach, J., Pasupathy, R., Rose, O., and Uhrmacher, A. M., editors, Proceedings of the 2012 Winter Simulation Conference Proceedings , pages 79--94, Piscataway, New Jersey. Institute of Electrical and Electronics Engineers, Inc

  22. [22]

    I., Powell, W

    Frazier, P. I., Powell, W. B., and Dayanik, S. (2008). A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization , 47(5):2410--2439

  23. [23]

    Frazier, P. I. and Wang, J. (2016). Bayesian optimization for materials design. In Lookman, T., Alexander, F. J., and Rajan, K., editors, Information Science for Materials Discovery and Design , pages 45--75. Springer

  24. [24]

    R., Kusner, M

    Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). Bayesian optimization with inequality constraints. In ICML , pages 937--945

  25. [25]

    B., Stern, H

    Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis , volume 2. CRC Press Boca Raton, FL

  26. [26]

    Ginsbourger, D., Le Riche, R., and Carraro, L. (2007). A multi-points criterion for deterministic parallel global optimization based on kriging . In International Conference on Nonconvex Programming, NCP07 , Rouen, France

  27. [27]

    Ginsbourger, D., Le Riche, R., and Carraro, L. (2010). Kriging is well-suited to parallelize optimization . In Tenne, Y. and Goh, C. K., editors, Computational Intelligence in Expensive Optimization Problems , volume 2, pages 131--162. Springer

  28. [28]

    and Riche, R

    Ginsbourger, D. and Riche, R. (2010). Towards G aussian process-based optimization with finite time horizon. In Giovagnoli, A., Atkinson, A., Torsney, B., and May, C., editors, mODa 9--Advances in Model-Oriented Design and Analysis , pages 89--96. Springer

  29. [29]

    Gonz \'a lez, J., Osborne, M., and Lawrence, N. (2016). GLASSES : Relieving the myopia of bayesian optimisation. In Artificial Intelligence and Statistics , pages 790--799

  30. [30]

    Groot, P., Birlutiu, A., and Heskes, T. (2010). Bayesian monte carlo for the global optimization of expensive functions. In ECAI , pages 249--254

  31. [31]

    and Schuler, C

    Hennig, P. and Schuler, C. J. (2012). Entropy search for information-efficient global optimization. Journal of Machine Learning Research , 13:1809--1837

  32. [32]

    M., Gelbart, M

    Hern \'a ndez-Lobato, J. M., Gelbart, M. A., Hoffman, M. W., Adams, R. P., and Ghahramani, Z. (2015). Predictive entropy search for bayesian optimization with unknown constraints. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37 , pages 1699--1707. JMLR. org

  33. [33]

    M., Hoffman, M

    Hern \'a ndez-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. (2014). Predictive entropy search for efficient global optimization of black-box functions. In Advances in neural information processing systems , pages 918--926

  34. [34]

    Ho, Y.-C., Cao, X., and Cassandras, C. (1983). Infinitesimal and finite perturbation analysis for queueing networks. Automatica , 19(4):439--445

  35. [35]

    Huang, D., Allen, T., Notz, W., and Miller, R. (2006). Sequential kriging optimization using multiple-fidelity evaluations . Structural and Multidisciplinary Optimization , 32(5):369--382

  36. [36]

    I., and Sznitman, R

    Jedynak, B., Frazier, P. I., and Sznitman, R. (2012). Twenty questions with noise: B ayes optimal policies for entropy loss. Journal of Applied Probability , 49(1):114--136

  37. [37]

    R., Schonlau, M., and Welch, W

    Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization , 13(4):455--492

  38. [38]

    Ju, S., Shiga, T., Feng, L., Hou, Z., Tsuda, K., and Shiomi, J. (2017). Designing nanostructures for phonon transport via B ayesian optimization. Physical Review X , 7

  39. [39]

    P., Littman, M

    Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research , 4:237--285

  40. [40]

    B., Schneider, J., and P \'o czos, B

    Kandasamy, K., Dasarathy, G., Oliva, J. B., Schneider, J., and P \'o czos, B. (2016). Gaussian process bandit optimisation with multi-fidelity evaluations. In Advances in Neural Information Processing Systems , pages 992--1000

  41. [41]

    Kandasamy, K., Schneider, J., and P \'o czos, B. (2015). High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning , pages 295--304

  42. [42]

    Keane, A. (2006). Statistical improvement criteria for use in multiobjective design optimization . AIAA Journal , 44(4):879--891

  43. [43]

    Kersting, K., Plagemann, C., Pfaff, P., and Burgard, W. (2007). Most likely heteroscedastic gaussian process regression. In Proceedings of the 24th International Conference on Machine learning , pages 393--400. ACM

  44. [44]

    Kleijnen, J. P. et al. (2008). Design and Analysis of Simulation Experiments , volume 20. Springer

  45. [45]

    Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. (2016). Fast B ayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079

  46. [46]

    Knowles, J. (2006). ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems . IEEE Transactions on Evolutionary Computation , 10(1):50--66

  47. [47]

    Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering , 86(1):97--106

  48. [48]

    L., and Willcox, K

    Lam, R., Allaire, D. L., and Willcox, K. E. (2015). Multifidelity optimization using statistical surrogate modeling for non-hierarchical information sources. In 56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference , page 0143

  49. [49]

    Lam, R., Willcox, K., and Wolpert, D. H. (2016). Bayesian optimization with a finite budget: An approximate dynamic programming approach. In Advances in Neural Information Processing Systems , pages 883--891

  50. [50]

    Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming , 45(1-3):503--528

  51. [51]

    Lizotte, D. (2008). Practical Bayesian Optimization . PhD thesis, University of Alberta

  52. [52]

    Lizotte, D., Wang, T., Bowling, M., and Schuurmans, D. (2007). Automatic gait optimization with G aussian process regression . In Proceedings of IJCAI , pages 944--949

  53. [53]

    and Teneketzis, D

    Mahajan, A. and Teneketzis, D. (2008). Multi-armed bandit problems. In Hero, A., Casta\ n \' o n, D., Cochran, D., and Kastella, K., editors, Foundations and Applications of Sensor Management , pages 121--151. Springer

  54. [54]

    A., Mendiburu, A., and Hernando, L

    Mart \' , R., Lozano, J. A., Mendiburu, A., and Hernando, L. (2016). Multi-start methods. Handbook of Heuristics , pages 1--21

  55. [55]

    A., and Roberts, S

    McLeod, M., Osborne, M. A., and Roberts, S. J. (2017). Practical bayesian optimization for variable cost objectives. arXiv preprint arXiv:1703.04335

  56. [56]

    and Kleijnen, J

    Mehdad, E. and Kleijnen, J. P. (2018). Efficient global optimisation for black-box simulation via sequential intrinsic kriging. Journal of the Operational Research Society , 69:1--13

  57. [57]

    and Segal, I

    Milgrom, P. and Segal, I. (2002). Envelope theorems for arbitrary choice sets. Econometrica , 70(2):583--601

  58. [58]

    Minka, T. P. (2001). A family of algorithms for approximate B ayesian inference . PhD thesis, Massachusetts Institute of Technology

  59. [59]

    Mo c kus, J. (1975). On B ayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference , pages 400--404. Springer

  60. [60]

    Mo c kus, J. (1989). Bayesian Approach to Global Optimization: Theory and Applications . Kluwer Academic Publishers

  61. [61]

    and Mo c kus, L

    Mo c kus, J. and Mo c kus, L. (1991). Bayesian approach to global optimization and application to multiobjective and constrained problems . Journal of Optimization Theory and Applications , 70(1):157--172

  62. [62]

    Mo c kus, J., Tiesis, V., and Z ilinskas, A. (1978). The application of B ayesian methods for seeking the extremum . In Dixon, L. and Szego, G., editors, Towards Global Optimisation , volume 2, pages 117--129. Elsevier Science Ltd., North Holland, Amsterdam

  63. [63]

    Neal, R. M. (2003). Slice sampling. Annals of Statistics , 31(3):705--741

  64. [64]

    M., Frazier, P

    Negoescu, D. M., Frazier, P. I., and Powell, W. B. (2011). The knowledge gradient algorithm for sequencing experiments in drug discovery. INFORMS Journal on Computing , 23(1):46--363

  65. [65]

    A., Garnett, R., and Roberts, S

    Osborne, M. A., Garnett, R., and Roberts, S. J. (2009). Gaussian processes for global optimization. In 3rd International Conference on Learning and Intelligent Optimization (LION3) , pages 1--15. Citeseer

  66. [66]

    Packwood, D. (2017). Bayesian Optimization for Materials Science , volume 3. Springer

  67. [67]

    Perez, S. (2015). Twitter acquires machine learning startup whetlab. TechCrunch . Accessed July 3, 2018

  68. [68]

    Poloczek, M., Wang, J., and Frazier, P. (2017). Multi-information source optimization. In Advances in Neural Information Processing Systems , pages 4291--4301

  69. [69]

    Powell, W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality . John Wiley & Sons, New York

  70. [70]

    and Williams, C

    Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning . MIT Press, Cambridge, MA

  71. [71]

    and Shoemaker, C

    Regis, R. and Shoemaker, C. (2005). Constrained global optimization of expensive black box functions using radial basis functions. Journal of Global Optimization , 31(1):153--171

  72. [72]

    and Shoemaker, C

    Regis, R. and Shoemaker, C. (2007a). Improved strategies for radial basis function methods for global optimization. Journal of Global Optimization , 37(1):113--135

  73. [73]

    and Shoemaker, C

    Regis, R. and Shoemaker, C. (2007b). Parallel radial basis function methods for the global optimization of expensive functions. European Journal of Operational Research , 182(2):514--535

  74. [74]

    and Monro, S

    Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics , 22(3):400--407

  75. [75]

    Roustant, O., Ginsbourger, D., and Deville, Y. (2012). Dicekriging, diceoptim: Two r packages for the analysis of computer experiments by kriging-based metamodeling and optimization. Journal of Statistical Software, Articles , 51(1):1--55

  76. [76]

    L., and Staum, J

    Salemi, P., Nelson, B. L., and Staum, J. (2014). Discrete optimization via simulation using G aussian M arkov random fields. In Proceedings of the 2014 Winter Simulation Conference , pages 3809--3820. IEEE Press

  77. [77]

    Sasena, M. (2002). Flexibility and Efficiency Enhancements for Constrained Global Design Optimization with Kriging Approximations . PhD thesis, University of Michigan

  78. [78]

    J., and Jones, D

    Schonlau, M., Welch, W. J., and Jones, D. R. (1998). Global versus local search in constrained optimization of computer models. Lecture Notes --- Monograph Series , 34:11--25

  79. [79]

    I., and Powell, W

    Scott, W., Frazier, P. I., and Powell, W. B. (2011). The correlated knowledge gradient for simulation optimization of continuous parameters using G aussian process regression. SIAM Journal on Optimization , 21(3):996--1026

  80. [80]

    Seko, A., Togo, A., Hayashi, H., Tsuda, K., Chaput, L., and Tanaka, I. (2015). Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and B ayesian optimization. Physical Review Letters , 115

Showing first 80 references.