Recognition: 2 theorem links
· Lean TheoremA Tutorial on Bayesian Optimization
Pith reviewed 2026-05-13 14:12 UTC · model grok-4.3
The pith
Bayesian optimization builds a Gaussian process surrogate for an expensive objective and uses an acquisition function to choose each next evaluation point.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Bayesian optimization constructs a surrogate model of the objective function using Gaussian process regression to capture both the predicted value and the uncertainty at unevaluated points. It then defines an acquisition function from this model, such as expected improvement, entropy search, or knowledge gradient, to choose the next point to evaluate by maximizing the expected utility of the information gained. The tutorial extends this to noisy settings by providing a decision-theoretically justified version of expected improvement that accounts for observation noise.
What carries the argument
Gaussian process regression that supplies a posterior mean and variance, paired with an acquisition function that converts this posterior into a scalar score for choosing the next evaluation location.
If this is right
- The method requires far fewer evaluations than exhaustive search when each evaluation is expensive.
- The noise-aware expected improvement permits reliable optimization even when function values are corrupted by stochastic noise.
- Parallel and multi-fidelity extensions allow the same framework to use batches of evaluations or cheaper proxy models.
- Derivative information, when available, can be incorporated directly into the Gaussian process to sharpen the surrogate.
- Constraints and multi-task formulations are handled by modifying the acquisition function without changing the core surrogate.
Where Pith is reading between the lines
- The same surrogate-plus-acquisition structure can be reused for hyperparameter tuning in machine learning, where each trial is costly.
- The formal noisy expected improvement may outperform earlier heuristic adjustments on real data sets that contain measurement error.
- For problems exceeding roughly twenty dimensions the Gaussian process surrogate becomes computationally expensive, suggesting a natural boundary for the method's direct application.
Load-bearing premise
A Gaussian process supplies an adequate probabilistic model of the unknown objective function.
What would settle it
On a standard benchmark function whose optimum and noise level are known in advance, compare the number of evaluations required by the method against random search or a simple grid; if the method needs as many or more evaluations to reach the known optimum, the claimed efficiency does not hold.
read the original abstract
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This tutorial describes Bayesian optimization for expensive-to-evaluate black-box functions over low-dimensional continuous domains. It covers Gaussian process regression as the surrogate, three acquisition functions (expected improvement, entropy search, knowledge gradient), and extensions including parallel evaluations, multi-fidelity optimization, constraints, random environmental conditions, multi-task settings, and derivative information. The central technical contribution is a generalization of expected improvement to noisy observations, derived via a formal decision-theoretic argument rather than ad hoc modifications.
Significance. If the decision-theoretic derivation of noisy expected improvement is correct, the paper supplies a clear, self-contained reference that consolidates established material while adding a principled treatment of noise. The breadth of advanced topics and the explicit contrast with prior ad hoc approaches make it potentially useful for both newcomers and practitioners seeking a unified exposition.
major comments (1)
- The decision-theoretic justification for the noisy-EI generalization is presented as the key novelty, yet the manuscript does not include an explicit reduction showing that the new acquisition function recovers the standard noise-free EI when observation noise variance approaches zero. This step is load-bearing for the claim that the generalization is principled rather than ad hoc.
minor comments (3)
- Notation for the noisy observation model (e.g., y = f(x) + ε) is introduced without a dedicated equation number; cross-referencing would improve readability in the acquisition-function sections.
- Several figures illustrating GP posterior samples and acquisition surfaces lack axis labels or legends indicating the noise level, making it difficult to connect the visuals to the noisy-EI derivation.
- The discussion of software packages in the final section would benefit from explicit version numbers or DOIs for the cited libraries to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the positive assessment. We address the single major comment below.
read point-by-point responses
-
Referee: The decision-theoretic justification for the noisy-EI generalization is presented as the key novelty, yet the manuscript does not include an explicit reduction showing that the new acquisition function recovers the standard noise-free EI when observation noise variance approaches zero. This step is load-bearing for the claim that the generalization is principled rather than ad hoc.
Authors: We agree that an explicit reduction to the noise-free case would make the principled character of the generalization clearer. In the revised manuscript we will add a short derivation (in the main text or an appendix) showing that the proposed acquisition function recovers standard expected improvement in the limit of vanishing observation noise. The argument follows directly from the decision-theoretic construction once the posterior variance contributed by observation noise is set to zero. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a tutorial explaining Gaussian process regression and standard acquisition functions (expected improvement, entropy search, knowledge gradient) before presenting a generalization of expected improvement to noisy observations. This generalization is derived from a decision-theoretic argument that starts from the definition of the acquisition function and the posterior over the objective; the derivation does not reduce to a fitted parameter renamed as a prediction, a self-referential definition, or a load-bearing self-citation. All steps remain self-contained against external benchmarks (standard GP theory and decision theory) with no equations shown to be equivalent to their inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gaussian process regression supplies a suitable surrogate model that quantifies uncertainty for the objective function
Forward citations
Cited by 22 Pith papers
-
Elicitation-Augmented Bayesian Optimization
A cost-aware value-of-information acquisition function is derived to balance direct observations against noisy pairwise human comparisons in Bayesian optimization, approaching the convex hull of the individual informa...
-
Bayesian Optimization with Structured Measurements: A Vector-Valued RKHS Framework
Proposes a vector-valued RKHS framework for Bayesian optimization with structured measurements, deriving concentration bounds and UCB-based regret guarantees that recover sublinear rates.
-
Learning myopic mixed-integer nonlinear model predictive control from expert demonstrations
A myopic MINMPC framework learns a value function offline via inverse optimization from expert data, allowing short horizons with near-optimal performance and strict integer feasibility online for hybrid systems.
-
Categorical Optimization with Bayesian Anchored Latent Trust Regions for Structural Design under High-Dimensional Uncertainty
COBALT performs direct discrete optimization over high-dimensional categorical structural designs by anchoring latent embeddings as graphs and applying trust-region acquisition on additive Gaussian process surrogates ...
-
An Efficient Spatial Branch-and-Bound Algorithm for Global Optimization of Gaussian Process Posterior Mean Functions
PALM-Mean combines sign-aware piecewise-linear relaxations of locally important kernel terms with closed-form analytic bounds on the rest inside a reduced-space branch-and-bound framework, yielding valid lower bounds ...
-
Collaborative Contextual Bayesian Optimization
CCBO enables collaborative contextual Bayesian optimization across clients with sublinear regret guarantees and shows substantial gains over non-collaborative methods in simulations and a hot rolling application even ...
-
Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification
SSLA approximates the posterior predictive distribution by refitting Bayesian models on self-predicted data, providing a sampling-free method that improves predictive calibration over classical Laplace approximations ...
-
Ensemble Distributionally Robust Bayesian Optimisation
A tractable ensemble distributionally robust Bayesian optimization method achieves improved sublinear regret bounds under context uncertainty.
-
Bayesian Algorithm for Collaborative Optimization with Application to Aircraft Design
BACO replaces direct black-box calls in collaborative optimization with Gaussian process surrogates at both subsystem and system levels, achieving lower objectives and near-zero constraint violations on MDO benchmarks...
-
Inducing Permutation Invariant Priors in Bayesian Optimization for Carbon Capture and Storage Applications
A novel permutation-invariant GP kernel using set divergence is introduced for Bayesian optimization in CCS well placement and tested on synthetic benchmarks plus one real formation case.
-
HASOD: A Hybrid Adaptive Screening-Optimization Design for High-Dimensional Industrial Experiments
HASOD is a hybrid adaptive framework that unifies factor screening via a new CWESS statistic and response optimization using Gaussian processes, achieving 97% detection accuracy in simulations with asymptotic consiste...
-
On the Tradeoffs of On-Device Generative Models in Federated Predictive Maintenance Systems
Experiments on real industrial time series show that partial model sharing improves diffusion model performance in bandwidth-limited non-IID settings, while full sharing stabilizes GAN training but offers less robustn...
-
ORTHOBO: Orthogonal Bayesian Hyperparameter Optimization
OrthoBO introduces an orthogonal acquisition estimator subtracting an optimally weighted score-function control variate to reduce Monte Carlo variance, preserve the acquisition target, and improve ranking stability in...
-
Harnessing a 256-qubit Neutral Atom Simulator for Graph Classification
A 256-qubit neutral atom simulator computes Quantum Evolution Kernels for graph classification on the PROTEINS dataset, achieving slightly better performance than classical kernels.
-
Caliper-in-the-Loop: Black-Box Optimization for Hyperledger Fabric Performance Tuning
Bayesian optimization with dimensionality reduction improves Hyperledger Fabric throughput by up to 12% in a 317-dimensional configuration space via an automated Caliper benchmarking loop.
-
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-...
-
Physics-informed automated surface reconstructing via low-energy electron diffraction based on Bayesian optimization
A trust-region Bayesian optimization framework integrates LEED multiple scattering models to jointly optimize structural and experimental parameters for automated surface reconstruction.
-
Closed-Loop CO2 Storage Control With History-Based Reinforcement Learning and Latent Model-Based Adaptation
History-conditioned RL policies recover nearly all privileged-state performance with deployable well data, and latent model-based retuning outperforms direct model-free retuning under abnormal reservoir conditions.
-
Enhancing Model Based Derivative Free Optimization using Direct Search
A hybrid switching approach integrates Direct Search into model-based derivative-free optimization, with a convergence proof for single-objective cases and empirical gains on ML tasks and CUTEr benchmarks.
-
BayMOTH: Bayesian optiMizatiOn with meTa-lookahead -- a simple approacH
BayMOTH unifies meta-Bayesian optimization with a usefulness-based fallback to lookahead, demonstrating competitive results on function optimization tasks even under low task relatedness.
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
-
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
Bayesian optimization automates the scientific discovery cycle by modeling observations with surrogate models and using acquisition functions to select experiments that balance known information with new exploration.
Reference graph
Works this paper leans on
-
[1]
O., Shahriari, B., and Schmidt, M
Ahmed, M. O., Shahriari, B., and Schmidt, M. (2016). Do we need ``harmless'' B ayesian optimization and ``first-order'' B ayesian optimization. In Neural Information Processing Systems 2016 Workshop on Bayesian Optimization
work page 2016
-
[2]
Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis . Springer Science & Business Media
work page 2013
-
[3]
Blum, J. R. (1954). Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics , pages 737--744
work page 1954
-
[4]
Booker, A., Dennis, J., Frank, P., Serafini, D., Torczon, V., and Trosset, M. (1999). A rigorous framework for optimization of expensive functions by surrogates . Structural and Multidisciplinary Optimization , 17(1):1--13
work page 1999
-
[5]
Bottou, L. (2012). Stochastic gradient descent tricks. In Montavon, G., Orr, G. B., and M \"u ller, K. R., editors, Neural Networks: Tricks of the Trade , pages 421--436. Springer
work page 2012
-
[6]
Brochu, E., Cora, M., and de Freitas, N. (2009). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, Department of Computer Science, University of British Columbia. arXiv:1012.2599
work page Pith review arXiv 2009
-
[7]
Bull, A. D. (2011). Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research , 12(Oct):2879--2904
work page 2011
-
[8]
Calvin, J. (1997). Average performance of a class of adaptive algorithms for global optimization . The Annals of Applied Probability , 7(3):711--730
work page 1997
-
[9]
Calvin, J. and Z ilinskas, A. (2005). One-dimensional global optimization for observations with noise . Computers & Mathematics with Applications , 50(1-2):157--169
work page 2005
-
[10]
Calvin, J. and Z ilinskas, A. (1999). On the convergence of the P-algorithm for one-dimensional global optimization of smooth functions . Journal of Optimization Theory and Applications , 102(3):479--495
work page 1999
-
[11]
Calvin, J. and Z ilinskas, A. (2000). One-dimensional P-algorithm with convergence rate O(n-3+ ) for smooth functions . Journal of Optimization Theory and Applications , 106(2):297--307
work page 2000
-
[12]
M., Kumarga, L., and Frazier, P
Cashore, J. M., Kumarga, L., and Frazier, P. I. (2016). Multi-step B ayesian optimization for one-dimensional feasibility determination. arXiv preprint arXiv:1607.03195
-
[13]
Chang, P. B., Williams, B. J., Bhalla, K. S. B., Belknap, T. W., Santner, T. J., Notz, W. I., and Bartel, D. L. (2001). Design and analysis of robust total joint replacements: finite element model experiments with environmental variables. Journal of Biomechanical Engineering , 123(3):239--246
work page 2001
-
[14]
Chick, S. E. and Inoue, K. (2001). New two-stage and sequential procedures for selecting the best simulated system. Operations Research , 49(5):732--743
work page 2001
-
[15]
Clark, C. E. (1961). The greatest of a finite set of random variables. Operations Research , 9(2):145--162
work page 1961
-
[16]
Cover, T. M. and Thomas, J. A. (2012). Elements of Information Theory . John Wiley & Sons
work page 2012
-
[17]
Dynkin, E. and Yushkevich, A. (1979). Controlled Markov Processes . Springer, New York
work page 1979
-
[18]
Forrester, A., S \'o bester, A., and Keane, A. (2008). Engineering Design via Surrogate Modelling: A Practical Guide . Wiley, West Sussex, UK
work page 2008
-
[19]
I., S \'o bester, A., and Keane, A
Forrester, A. I., S \'o bester, A., and Keane, A. J. (2007). Multi-fidelity optimization via surrogate modelling. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences , volume 463, pages 3251--3269. The Royal Society
work page 2007
-
[20]
Frazier, P., Powell, W., and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS Journal on Computing , 21(4):599--613
work page 2009
-
[21]
Frazier, P. I. (2012). Tutorial: Optimization via simulation with bayesian statistics and dynamic programming. In Laroque, C., Himmelspach, J., Pasupathy, R., Rose, O., and Uhrmacher, A. M., editors, Proceedings of the 2012 Winter Simulation Conference Proceedings , pages 79--94, Piscataway, New Jersey. Institute of Electrical and Electronics Engineers, Inc
work page 2012
-
[22]
Frazier, P. I., Powell, W. B., and Dayanik, S. (2008). A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization , 47(5):2410--2439
work page 2008
-
[23]
Frazier, P. I. and Wang, J. (2016). Bayesian optimization for materials design. In Lookman, T., Alexander, F. J., and Rajan, K., editors, Information Science for Materials Discovery and Design , pages 45--75. Springer
work page 2016
-
[24]
Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). Bayesian optimization with inequality constraints. In ICML , pages 937--945
work page 2014
-
[25]
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis , volume 2. CRC Press Boca Raton, FL
work page 2014
-
[26]
Ginsbourger, D., Le Riche, R., and Carraro, L. (2007). A multi-points criterion for deterministic parallel global optimization based on kriging . In International Conference on Nonconvex Programming, NCP07 , Rouen, France
work page 2007
-
[27]
Ginsbourger, D., Le Riche, R., and Carraro, L. (2010). Kriging is well-suited to parallelize optimization . In Tenne, Y. and Goh, C. K., editors, Computational Intelligence in Expensive Optimization Problems , volume 2, pages 131--162. Springer
work page 2010
-
[28]
Ginsbourger, D. and Riche, R. (2010). Towards G aussian process-based optimization with finite time horizon. In Giovagnoli, A., Atkinson, A., Torsney, B., and May, C., editors, mODa 9--Advances in Model-Oriented Design and Analysis , pages 89--96. Springer
work page 2010
-
[29]
Gonz \'a lez, J., Osborne, M., and Lawrence, N. (2016). GLASSES : Relieving the myopia of bayesian optimisation. In Artificial Intelligence and Statistics , pages 790--799
work page 2016
-
[30]
Groot, P., Birlutiu, A., and Heskes, T. (2010). Bayesian monte carlo for the global optimization of expensive functions. In ECAI , pages 249--254
work page 2010
-
[31]
Hennig, P. and Schuler, C. J. (2012). Entropy search for information-efficient global optimization. Journal of Machine Learning Research , 13:1809--1837
work page 2012
-
[32]
Hern \'a ndez-Lobato, J. M., Gelbart, M. A., Hoffman, M. W., Adams, R. P., and Ghahramani, Z. (2015). Predictive entropy search for bayesian optimization with unknown constraints. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37 , pages 1699--1707. JMLR. org
work page 2015
-
[33]
Hern \'a ndez-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. (2014). Predictive entropy search for efficient global optimization of black-box functions. In Advances in neural information processing systems , pages 918--926
work page 2014
-
[34]
Ho, Y.-C., Cao, X., and Cassandras, C. (1983). Infinitesimal and finite perturbation analysis for queueing networks. Automatica , 19(4):439--445
work page 1983
-
[35]
Huang, D., Allen, T., Notz, W., and Miller, R. (2006). Sequential kriging optimization using multiple-fidelity evaluations . Structural and Multidisciplinary Optimization , 32(5):369--382
work page 2006
-
[36]
Jedynak, B., Frazier, P. I., and Sznitman, R. (2012). Twenty questions with noise: B ayes optimal policies for entropy loss. Journal of Applied Probability , 49(1):114--136
work page 2012
-
[37]
R., Schonlau, M., and Welch, W
Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization , 13(4):455--492
work page 1998
-
[38]
Ju, S., Shiga, T., Feng, L., Hou, Z., Tsuda, K., and Shiomi, J. (2017). Designing nanostructures for phonon transport via B ayesian optimization. Physical Review X , 7
work page 2017
-
[39]
Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research , 4:237--285
work page 1996
-
[40]
B., Schneider, J., and P \'o czos, B
Kandasamy, K., Dasarathy, G., Oliva, J. B., Schneider, J., and P \'o czos, B. (2016). Gaussian process bandit optimisation with multi-fidelity evaluations. In Advances in Neural Information Processing Systems , pages 992--1000
work page 2016
-
[41]
Kandasamy, K., Schneider, J., and P \'o czos, B. (2015). High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning , pages 295--304
work page 2015
-
[42]
Keane, A. (2006). Statistical improvement criteria for use in multiobjective design optimization . AIAA Journal , 44(4):879--891
work page 2006
-
[43]
Kersting, K., Plagemann, C., Pfaff, P., and Burgard, W. (2007). Most likely heteroscedastic gaussian process regression. In Proceedings of the 24th International Conference on Machine learning , pages 393--400. ACM
work page 2007
-
[44]
Kleijnen, J. P. et al. (2008). Design and Analysis of Simulation Experiments , volume 20. Springer
work page 2008
- [45]
-
[46]
Knowles, J. (2006). ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems . IEEE Transactions on Evolutionary Computation , 10(1):50--66
work page 2006
-
[47]
Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering , 86(1):97--106
work page 1964
-
[48]
Lam, R., Allaire, D. L., and Willcox, K. E. (2015). Multifidelity optimization using statistical surrogate modeling for non-hierarchical information sources. In 56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference , page 0143
work page 2015
-
[49]
Lam, R., Willcox, K., and Wolpert, D. H. (2016). Bayesian optimization with a finite budget: An approximate dynamic programming approach. In Advances in Neural Information Processing Systems , pages 883--891
work page 2016
-
[50]
Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming , 45(1-3):503--528
work page 1989
-
[51]
Lizotte, D. (2008). Practical Bayesian Optimization . PhD thesis, University of Alberta
work page 2008
-
[52]
Lizotte, D., Wang, T., Bowling, M., and Schuurmans, D. (2007). Automatic gait optimization with G aussian process regression . In Proceedings of IJCAI , pages 944--949
work page 2007
-
[53]
Mahajan, A. and Teneketzis, D. (2008). Multi-armed bandit problems. In Hero, A., Casta\ n \' o n, D., Cochran, D., and Kastella, K., editors, Foundations and Applications of Sensor Management , pages 121--151. Springer
work page 2008
-
[54]
A., Mendiburu, A., and Hernando, L
Mart \' , R., Lozano, J. A., Mendiburu, A., and Hernando, L. (2016). Multi-start methods. Handbook of Heuristics , pages 1--21
work page 2016
-
[55]
McLeod, M., Osborne, M. A., and Roberts, S. J. (2017). Practical bayesian optimization for variable cost objectives. arXiv preprint arXiv:1703.04335
-
[56]
Mehdad, E. and Kleijnen, J. P. (2018). Efficient global optimisation for black-box simulation via sequential intrinsic kriging. Journal of the Operational Research Society , 69:1--13
work page 2018
-
[57]
Milgrom, P. and Segal, I. (2002). Envelope theorems for arbitrary choice sets. Econometrica , 70(2):583--601
work page 2002
-
[58]
Minka, T. P. (2001). A family of algorithms for approximate B ayesian inference . PhD thesis, Massachusetts Institute of Technology
work page 2001
-
[59]
Mo c kus, J. (1975). On B ayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference , pages 400--404. Springer
work page 1975
-
[60]
Mo c kus, J. (1989). Bayesian Approach to Global Optimization: Theory and Applications . Kluwer Academic Publishers
work page 1989
-
[61]
Mo c kus, J. and Mo c kus, L. (1991). Bayesian approach to global optimization and application to multiobjective and constrained problems . Journal of Optimization Theory and Applications , 70(1):157--172
work page 1991
-
[62]
Mo c kus, J., Tiesis, V., and Z ilinskas, A. (1978). The application of B ayesian methods for seeking the extremum . In Dixon, L. and Szego, G., editors, Towards Global Optimisation , volume 2, pages 117--129. Elsevier Science Ltd., North Holland, Amsterdam
work page 1978
-
[63]
Neal, R. M. (2003). Slice sampling. Annals of Statistics , 31(3):705--741
work page 2003
-
[64]
Negoescu, D. M., Frazier, P. I., and Powell, W. B. (2011). The knowledge gradient algorithm for sequencing experiments in drug discovery. INFORMS Journal on Computing , 23(1):46--363
work page 2011
-
[65]
A., Garnett, R., and Roberts, S
Osborne, M. A., Garnett, R., and Roberts, S. J. (2009). Gaussian processes for global optimization. In 3rd International Conference on Learning and Intelligent Optimization (LION3) , pages 1--15. Citeseer
work page 2009
-
[66]
Packwood, D. (2017). Bayesian Optimization for Materials Science , volume 3. Springer
work page 2017
-
[67]
Perez, S. (2015). Twitter acquires machine learning startup whetlab. TechCrunch . Accessed July 3, 2018
work page 2015
-
[68]
Poloczek, M., Wang, J., and Frazier, P. (2017). Multi-information source optimization. In Advances in Neural Information Processing Systems , pages 4291--4301
work page 2017
-
[69]
Powell, W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality . John Wiley & Sons, New York
work page 2007
-
[70]
Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning . MIT Press, Cambridge, MA
work page 2006
-
[71]
Regis, R. and Shoemaker, C. (2005). Constrained global optimization of expensive black box functions using radial basis functions. Journal of Global Optimization , 31(1):153--171
work page 2005
-
[72]
Regis, R. and Shoemaker, C. (2007a). Improved strategies for radial basis function methods for global optimization. Journal of Global Optimization , 37(1):113--135
-
[73]
Regis, R. and Shoemaker, C. (2007b). Parallel radial basis function methods for the global optimization of expensive functions. European Journal of Operational Research , 182(2):514--535
-
[74]
Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics , 22(3):400--407
work page 1951
-
[75]
Roustant, O., Ginsbourger, D., and Deville, Y. (2012). Dicekriging, diceoptim: Two r packages for the analysis of computer experiments by kriging-based metamodeling and optimization. Journal of Statistical Software, Articles , 51(1):1--55
work page 2012
-
[76]
Salemi, P., Nelson, B. L., and Staum, J. (2014). Discrete optimization via simulation using G aussian M arkov random fields. In Proceedings of the 2014 Winter Simulation Conference , pages 3809--3820. IEEE Press
work page 2014
-
[77]
Sasena, M. (2002). Flexibility and Efficiency Enhancements for Constrained Global Design Optimization with Kriging Approximations . PhD thesis, University of Michigan
work page 2002
-
[78]
Schonlau, M., Welch, W. J., and Jones, D. R. (1998). Global versus local search in constrained optimization of computer models. Lecture Notes --- Monograph Series , 34:11--25
work page 1998
-
[79]
Scott, W., Frazier, P. I., and Powell, W. B. (2011). The correlated knowledge gradient for simulation optimization of continuous parameters using G aussian process regression. SIAM Journal on Optimization , 21(3):996--1026
work page 2011
-
[80]
Seko, A., Togo, A., Hayashi, H., Tsuda, K., Chaput, L., and Tanaka, I. (2015). Prediction of low-thermal-conductivity compounds with first-principles anharmonic lattice-dynamics calculations and B ayesian optimization. Physical Review Letters , 115
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.