pith. sign in

arxiv: 2605.20145 · v1 · pith:A7INK2HJnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG· stat.ME

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

Pith reviewed 2026-05-20 03:07 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords Bayesian optimizationGaussian processespredictive calibrationlower-tail calibrationexpected improvementspatial calibrationnoiseless setting
0
0 comments X

The pith

A post-hoc method calibrates Gaussian process lower tails below a threshold for Bayesian optimization while keeping the search algorithm dense in the design space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Gaussian process models in Bayesian optimization can miscalibrate their predictions in the lower tail, which directly affects sampling decisions for minimization problems that rely on criteria such as expected improvement. The paper develops a goal-oriented approach that targets predictive reliability specifically below a chosen low threshold t rather than across the whole distribution. It introduces a spatial calibration framework built on occurrence calibration over the full design space and thresholded μ-calibration restricted to sublevel sets where the objective is at most t. Using this framework, the authors construct tcGP, a post-hoc adjustment applied after standard maximum-likelihood training, and prove that the resulting expected-improvement optimizer still visits every region of the space densely.

Core claim

In the noiseless setting, standard Gaussian processes with maximum-likelihood hyperparameters can be post-hoc calibrated below a low threshold t via the tcGP procedure so that their predictive distributions satisfy occurrence calibration over the design space and thresholded μ-calibration on sublevel sets; the expected-improvement acquisition function built on these calibrated distributions produces a global optimization algorithm that remains dense in the design space and yields improved lower-tail calibration together with better optimization performance on standard benchmarks relative to both uncalibrated and globally calibrated Gaussian processes.

What carries the argument

tcGP, the post-hoc calibration procedure that enforces occurrence calibration across the design space and thresholded μ-calibration on sublevel sets of the form {x : f(x) ≤ t}.

If this is right

  • The expected-improvement optimizer that uses tcGP remains dense in the design space.
  • Lower-tail calibration of the predictive distributions improves relative to both standard and globally calibrated Gaussian processes.
  • Bayesian optimization performance improves on standard benchmark functions compared with uncalibrated and globally calibrated models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Selective lower-tail calibration may preserve useful upper-tail behavior that global calibration would alter.
  • The same spatial-calibration ideas could be tested on acquisition functions other than expected improvement that also depend on lower-tail accuracy.
  • Because the method works with existing maximum-likelihood fits, it can be inserted into current Gaussian-process Bayesian optimization pipelines without retraining.

Load-bearing premise

The post-hoc adjustment can be applied to ordinary maximum-likelihood Gaussian processes without creating inconsistencies that would destroy the density property of the resulting expected-improvement optimizer.

What would settle it

An experiment in which the expected-improvement algorithm driven by tcGP-calibrated predictions either fails to explore the design space densely or produces worse optimization performance than a standard uncalibrated Gaussian process on a benchmark function.

Figures

Figures reproduced from arXiv: 2605.20145 by Aur\'elien Pion, Emmanuel Vazquez.

Figure 1
Figure 1. Figure 1: Comparison between a standard GP and a model calibrated below qδ,n (TCGP), with δ = 0.3, on a standard one-dimensional test function and with n = 10 evaluations. Left: observations with GP predictions; the current design does not include a point near the global minimizer. Middle: probabilities of improvement Fˆ (0) n (mn | x) (GP, black) and Fˆ (1) n (mn | x) (TCGP, red), where mn = mini≤n f(Xi). Right: EI… view at source ↗
Figure 2
Figure 2. Figure 2: BO performance and calibration metrics. From left to right: median and 10%/90% quantiles across runs of the estimated excursion probability pmn = P(f(X) ≤ mn) with X ∼ U(X); median twCRPS; median occurrence discrepancy rt; and median tKS–PIT. Calibration metrics are evaluated on a test set at the current best value mn. Results are shown for a standard GP and three TCGP variants using J, tKS–PIT, or rt as t… view at source ↗
Figure 3
Figure 3. Figure 3: BO performance summarized by the excursion probability below the current best value. For each run and iteration n, we estimate pmn = P(f(X) ≤ mn) with X ∼ U(X), and report the median and 10%/90% quantiles of pmn across runs for Goldstein–Price, Dixon–Price (d = 4), Rosenbrock (d = 6), and Ackley (d = 4). Methods: GP, BCRGP, onGP, REGP (δ = 0.25), TCGP (δ = 0.05). effect on EI-driven BO than thresholded µ-c… view at source ↗
Figure 4
Figure 4. Figure 4: Training time for REGP, GP, TCGP, onGP, BCRGP at each iteration of BO. F.2. Alternative Selection Criteria for TCGP We compare several objectives for selecting (β, λ) in TCGP, with the aim of controlling thresholded µ-calibration and occurrence calibration below t = qδ,n. Rule 0 is the criterion J defined in (18) [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of TCGP variants. Left: median and 10%/90% quantiles of pmn = P(f(X) ≤ mn), where mn is the best observed value after n evaluations and X ∼ µ. Right: fraction of runs reaching the prescribed target value. Rule 2. Following Allen et al. (2025), define ct(u) = Pn  U β,λ t ≤ u | f(X) ≤ t  κ β,λ t . (90) Then use the KS-type deviation J (2)(β, λ) = sup u∈[0,1] |ct(u) − u| . (91) Rule 3. Compare th… view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity of REGP to δ (results shown after n = 100 BO iterations). Left: median and 10%/90% quantiles of pmn = P(f(X) ≤ mn), where mn is the best value observed after n evaluations and X ∼ µ. Right: fraction of runs reaching the prescribed target level. reaching the prescribed target. Empirically, a large value of δ (e.g., δ = 0.5) consistently leads to degraded performance across the considered problem… view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity of TCGP to δ (results shown after n = 100 BO iterations). Left: median and 10%/90% quantiles of pmn = P(f(X) ≤ mn), where mn is the best value observed after n evaluations and X ∼ µ. Right: fraction of runs reaching the prescribed target level. calibration and sharpness below qn,δ), rt (occurrence calibration at tn), and tKS–PIT (thresholded µ–calibration below tn). The tKS–PIT is computed usin… view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of the BO performance with EI as sampling criteria for GP, REGP with δ = 0.25, onGP and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of the BO performance with EI as sampling criteria for GP, REGP with δ = 0.25, onGP and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of the BO performance in moderate dimension d = 10 to d = 20 for 150 iterations with EI as sampling criteria for GP, REGP with δ = 0.25, onGP and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of the BO performance with UCB with ε = 0.1 as sampling criteria for GP, and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of the BO performance with UCB with ε = 0.1 as sampling criteria for GP, and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
read the original abstract

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $\mu$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes tcGP, a post-hoc goal-oriented calibration method for the lower tail of Gaussian process predictive distributions below a threshold t in the noiseless setting. It introduces a spatial calibration framework with occurrence calibration and thresholded μ-calibration on sublevel sets, shows that the resulting EI-based global optimization algorithm remains dense in the design space, and reports experimental improvements in lower-tail calibration and BO performance over standard GPs and globally calibrated models.

Significance. If the density result holds under sequential updates, the work provides a theoretically supported way to address lower-tail miscalibration in GP-based BO without altering the core model or MLE fitting. The experimental gains on benchmarks suggest practical value for improving exploration-exploitation balance in minimization tasks. The contribution is strengthened by the explicit density claim but tempered by reliance on post-hoc adjustment preserving key properties.

major comments (2)
  1. [Theoretical results on density (likely §4 or §5)] The central density claim for the EI-based optimizer under tcGP (stated in the abstract and presumably proved in the theoretical section) relies on properties of the calibrated predictive distribution. However, the skeptic note indicates that the argument may assume fixed t, while sequential BO updates t to the current incumbent after each observation, dynamically redefining the calibrated sublevel sets. Please specify the section containing the density proof and clarify whether the continuity, strict positivity away from observed points, and uniform continuity arguments are re-verified after each such data-dependent update, or if an extension is provided.
  2. [Framework and tcGP definition (likely §3)] Abstract and introduction claim that tcGP is applied to standard GP models with MLE hyperparameters without introducing inconsistencies that invalidate the density property. However, since calibration parameters are fit post-hoc (potentially to the same data), it is unclear how this preserves the required predictive properties for the density argument. Cite the specific result or lemma that shows the calibrated mean/variance functions remain sufficiently regular for the EI acquisition to satisfy the conditions for density in the design space.
minor comments (2)
  1. [Abstract and §1] The abstract mentions 'noiseless setting' but does not clarify if the framework or experiments extend to noisy observations; if the paper restricts to noiseless, this should be stated more prominently in the introduction.
  2. [Notation section] Notation for the threshold t and sublevel sets {x : f(x) ≤ t} is introduced but could be made more consistent when t is updated to the incumbent value; consider adding a remark on the time-dependence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments on the theoretical foundations of tcGP. We address each major comment below with clarifications drawn directly from the manuscript and commit to revisions that improve the presentation without altering the core claims.

read point-by-point responses
  1. Referee: [Theoretical results on density (likely §4 or §5)] The central density claim for the EI-based optimizer under tcGP (stated in the abstract and presumably proved in the theoretical section) relies on properties of the calibrated predictive distribution. However, the skeptic note indicates that the argument may assume fixed t, while sequential BO updates t to the current incumbent after each observation, dynamically redefining the calibrated sublevel sets. Please specify the section containing the density proof and clarify whether the continuity, strict positivity away from observed points, and uniform continuity arguments are re-verified after each such data-dependent update, or if an extension is provided.

    Authors: The density result for the EI-based optimizer is established in Section 4. The proof is stated for a general fixed threshold t and relies on the calibrated predictive mean and variance satisfying continuity, strict positivity of variance away from observed points, and uniform continuity on compact sets. In the sequential BO procedure, t is set to the current incumbent at each iteration and the calibration is re-applied using that t; because the calibration map is continuous in t and the underlying GP kernel properties are unchanged, the same regularity conditions hold at every step. We will insert a short remark in Section 4 explicitly noting that the argument applies iteratively with the data-dependent t. revision: partial

  2. Referee: [Framework and tcGP definition (likely §3)] Abstract and introduction claim that tcGP is applied to standard GP models with MLE hyperparameters without introducing inconsistencies that invalidate the density property. However, since calibration parameters are fit post-hoc (potentially to the same data), it is unclear how this preserves the required predictive properties for the density argument. Cite the specific result or lemma that shows the calibrated mean/variance functions remain sufficiently regular for the EI acquisition to satisfy the conditions for density in the design space.

    Authors: Section 3 defines tcGP via occurrence calibration and thresholded μ-calibration applied only on the sublevel set {x : f(x) ≤ t}. The post-hoc adjustment is constructed so that the calibrated mean remains continuous and the calibrated variance remains strictly positive away from observed points whenever the original GP satisfies these properties (which it does under standard MLE fitting with a continuous kernel). This preservation is stated as part of the framework in Section 3 and is used directly to invoke the density theorem in Section 4. We will add an explicit forward reference to the relevant paragraph in Section 3 both in the abstract and at the end of the introduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation of tcGP density property

full rationale

The paper defines a spatial calibration framework (occurrence calibration and thresholded μ-calibration on sublevel sets) and applies post-hoc tcGP calibration to standard MLE-fitted GPs in the noiseless case. The claim that the resulting EI optimizer remains dense follows from continuity and positivity properties of the calibrated predictive distributions on compact sets, without reducing the density result to a fitted parameter renamed as prediction or to a self-citation chain. No equations equate the target density statement to its inputs by construction, and the framework is presented as an independent extension rather than an ansatz smuggled via prior self-work. The derivation is self-contained against the introduced calibration notions and standard GP-EI arguments.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard GP assumptions plus the new calibration framework; one domain assumption and limited free parameters for the post-hoc step.

free parameters (1)
  • lower-tail calibration parameters
    Post-hoc adjustment parameters chosen to achieve occurrence and thresholded mu-calibration below t.
axioms (1)
  • domain assumption Noiseless observations and standard GP with MLE hyperparameters
    Explicitly stated as the setting for the study and method.

pith-pipeline@v0.9.0 · 5740 in / 1226 out tokens · 37413 ms · 2026-05-20T03:07:28.522596+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events

    Allen, S., Bhend, J., Martius, O., and Ziegel, J. Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events. Weather and Forecasting, 38 0 (3): 0 499 -- 516, 2023

  2. [2]

    Tail calibration of probabilistic forecasts

    Allen, S., Koh, J., Segers, J., and Ziegel, J. Tail calibration of probabilistic forecasts. J. Amer. Statist. Assoc., 120 0 (552): 0 2796--2808, 2025

  3. [3]

    Finite-time analysis of the multiarmed bandit problem

    Auer, P., Cesa - Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47: 0 235--256, 2002

  4. [4]

    B ayesian subset simulation

    Bect, J., Li, L., and Vazquez, E. B ayesian subset simulation. SIAM/ASA Journal on Uncertainty Quantification, 5 0 (1): 0 762--786, 2017

  5. [5]

    and Krause, A

    Bogunovic, I. and Krause, A. Misspecified G aussian process bandit optimization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 3004--3015. Curran Associates, Inc., 2021

  6. [6]

    Bull, A. D. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res., 12 0 (88): 0 2879--2904, 2011

  7. [7]

    and Ghahramani, Z

    Chu, W. and Ghahramani, Z. G aussian processes for ordinal regression. J. Mach. Learn. Res., 6 0 (35): 0 1019--1041, 2005

  8. [8]

    Online calibrated and conformal prediction improves B ayesian optimization

    Deshpande, S., Marx, C., and Kuleshov, V. Online calibrated and conformal prediction improves B ayesian optimization. In Dasgupta, S., Mandt, S., and Li, Y. (eds.), Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proc. Mach. Learn. Res., pp.\ 1450--1458. PMLR, 02--04 May 2024

  9. [9]

    A B ayesian approach to constrained single- and multi-objective optimization

    Feliot, P., Bect, J., and Vazquez, E. A B ayesian approach to constrained single- and multi-objective optimization. J. Global Optim., 67 0 (1-2): 0 97--133, April 2016

  10. [10]

    Forrester, A. I. J., S \'o bester, A., and Keane, A. J. Engineering Design via Surrogate Modelling: A Practical Guide. John Wiley & Sons, 2008

  11. [11]

    CRPS -based targeted sequential design with application in chemical space, 2025

    Friedli, L., Gautier, A., Broccard, A., and Ginsbourger, D. CRPS -based targeted sequential design with application in chemical space, 2025. URL https://arxiv.org/abs/2503.11250

  12. [12]

    and Resin, J

    Gneiting, T. and Resin, J. Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination. Electron. J. Stat., 17 0 (2), January 2023

  13. [13]

    Gneiting, T., Balabdaoui, F., and Raftery, A. E. Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol., 69 0 (2): 0 243--268, 2007

  14. [14]

    Gneiting, T. G. and Raftery, A. E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 0 (477): 0 359--378, 2007

  15. [15]

    S., and Liu, H

    Guo, Z., Ong, Y. S., and Liu, H. Calibrated and recalibrated expected improvements for B ayesian optimization. Struct. Multidiscip. Optim., 64: 0 3549--3567, 2021

  16. [16]

    and Ren, Z

    Jin, Y. and Ren, Z. Confidence on the focal: conformal prediction with selection-conditional coverage. J. R. Stat. Soc. Ser. B. Stat. Methodol., 87 0 (4): 0 1239--1259, 04 2025

  17. [17]

    Efficient global optimization of expensive black-box functions

    Jones, D., Schonlau, M., and Welch, W. Efficient global optimization of expensive black-box functions. J. Global Optim., 13: 0 455--492, 12 1998

  18. [18]

    Robust B ayesian optimization via localized online conformal prediction

    Kim, D., Zecchin, M., Park, S., Kang, J., and Simeone, O. Robust B ayesian optimization via localized online conformal prediction. IEEE Trans. Signal Process., 73: 0 2039--2052, 2025

  19. [19]

    Lai, T. L. and Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math., 6 0 (1): 0 4--22, 1985

  20. [20]

    Matheson, J. E. and Winkler, R. L. Scoring rules for continuous probability distributions. Manag. Sci., 22 0 (10): 0 1087--1096, 1976

  21. [21]

    Q., Hahn, G

    Meeker, W. Q., Hahn, G. J., and Escobar, L. A. Statistical Intervals: A Guide for Practitioners and Researchers. John Wiley & Sons, Hoboken, New Jersey, second edition, 2017. ISBN 978-0-471-68717-7

  22. [22]

    The application of B ayesian methods for seeking the extremum

    Mockus, J., Tiesis, V., and Zilinskas, A. The application of B ayesian methods for seeking the extremum. In Dixon, L. C. W. and Szeg \"o , G. P. (eds.), Towards Global Optimisation, volume 2, pp.\ 117--129. North-Holland, Amsterdam, 1978

  23. [23]

    A generalized normal distribution

    Nadarajah, S. A generalized normal distribution. J. Appl. Stat., 32 0 (7): 0 685--694, 2005

  24. [24]

    and Ramdas, A

    Neiswanger, W. and Ramdas, A. Uncertainty quantification using martingales for misspecified G aussian processes. In Feldman, V., Ligett, K., and Sabato, S. (eds.), Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proc. Mach. Learn. Res., pp.\ 963--982. PMLR, 16--19 Mar 2021

  25. [25]

    Nelder, J. A. and Mead, R. A simplex method for function minimization. The Computer Journal, 7 0 (4): 0 308--313, 01 1965

  26. [26]

    J., Bect, J., and Vazquez, E

    Petit, S. J., Bect, J., and Vazquez, E. Relaxed G aussian process interpolation: a goal-oriented approach to B ayesian optimization. Journal of Machine Learning Research, 26 0 (195): 0 1--70, 2025

  27. [27]

    Ordinal B ayesian optimisation, 2019

    Picheny, V., Vakili, S., and Artemev, A. Ordinal B ayesian optimisation, 2019. URL https://arxiv.org/abs/1912.02493

  28. [28]

    B ayesian quantile and expectile optimisation

    Picheny, V., Moss, H., Torossian, L., and Durrande, N. B ayesian quantile and expectile optimisation. In Cussens, J. and Zhang, K. (eds.), Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proc. Mach. Learn. Res., pp.\ 1623--1633. PMLR, 01--05 Aug 2022

  29. [29]

    and Vazquez, E

    Pion, A. and Vazquez, E. Design-marginal calibration of G aussian process predictive distributions: B ayesian and conformal approaches, 2025. URL https://arxiv.org/abs/2512.05611

  30. [30]

    Reliable decisions with threshold calibration

    Sahoo, R., Zhao, S., Chen, A., and Ermon, S. Reliable decisions with threshold calibration. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 1831--1844. Curran Associates, Inc., 2021

  31. [31]

    and Welch, W

    Schonlau, M. and Welch, W. J. Global optimization with nonparametric function fitting. In Proceedings of the ASA, Section on Physical and Engineering Sciences, pp.\ 183--186. Amer. Statist. Assoc., 1996

  32. [32]

    SciPy Project , 2026

    SciPy Developers . SciPy Project , 2026. URL https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gennorm.html. SciPy API Reference, accessed 2026-01

  33. [33]

    Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York, 1992

  34. [34]

    Improving predictive inference under covariate shift by weighting the log-likelihood function

    Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Plann. Inference, 90 0 (2): 0 227--244, 2000

  35. [35]

    M., and Seeger, M

    Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. G aussian process optimization in the bandit setting: No regret and experimental design. In Proc. 27th International Conference on Machine Learning (ICML 2010), pp.\ 1015--1022, 2010

  36. [36]

    Stanton, S., Maddox, W., and Wilson, A. G. B ayesian optimization with conformal prediction sets. In Ruiz, F., Dy, J., and van de Meent, J. W. (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proc. Mach. Learn. Res., pp.\ 959--986. PMLR, 25--27 Apr 2023

  37. [37]

    Stein, M. L. Interpolation of Spatial Data: Some Theory for Kriging. Springer Ser. Stat. Springer New York, 1999

  38. [38]

    and Bingham, D

    Surjanovic, S. and Bingham, D. Virtual library of simulation experiments: Test functions and datasets, 2013. URL https://www.sfu.ca/ ssurjano/. Accessed November 2025

  39. [39]

    Ranking over regression for B ayesian optimization and molecule selection

    Tom, G., Lo, S., Corapi, S., Aspuru-Guzik, A., and Sanchez-Lengeling, B. Ranking over regression for B ayesian optimization and molecule selection. APL Machine Learning, 3 0 (3): 0 036113, 08 2025

  40. [40]

    and Wang, W

    Tuo, R. and Wang, W. Uncertainty quantification for B ayesian optimization. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151 of Proc. Mach. Learn. Res., pp.\ 2862--2884. PMLR, 2022

  41. [41]

    Gpmp: the G aussian process micro package, 2026

    Vazquez, E. Gpmp: the G aussian process micro package, 2026. URL https://github.com/gpmp-dev/gpmp

  42. [42]

    and Bect, J

    Vazquez, E. and Bect, J. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plann. Inference, 140: 0 3088--3095, 11 2010

  43. [43]

    An informational approach to the global optimization of expensive-to-evaluate functions

    Villemonteix, J., Vazquez, E., and Walter, E. An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim., 44 0 (4): 0 509--534, 2009

  44. [44]

    Algorithmic Learning in a Random World

    Vovk, V., Gammerman, A., and Shafer, G. Algorithmic Learning in a Random World. Springer, 2005

  45. [45]

    Nonparametric predictive distributions based on conformal prediction

    Vovk, V., Shen, J., Manokhin, V., and Xie, M. Nonparametric predictive distributions based on conformal prediction. Mach. Learn., 108 0 (3): 0 445--474, 2019

  46. [46]

    and Candès, E

    Zhang, Y. and Candès, E. J. Posterior conformal prediction, 2024. URL https://arxiv.org/abs/2409.19712