Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

Aur\'elien Pion; Emmanuel Vazquez

arxiv: 2605.20145 · v1 · pith:A7INK2HJnew · submitted 2026-05-19 · 📊 stat.ML · cs.LG· stat.ME

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

Aur\'elien Pion , Emmanuel Vazquez This is my paper

Pith reviewed 2026-05-20 03:07 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords Bayesian optimizationGaussian processespredictive calibrationlower-tail calibrationexpected improvementspatial calibrationnoiseless setting

0 comments

The pith

A post-hoc method calibrates Gaussian process lower tails below a threshold for Bayesian optimization while keeping the search algorithm dense in the design space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Gaussian process models in Bayesian optimization can miscalibrate their predictions in the lower tail, which directly affects sampling decisions for minimization problems that rely on criteria such as expected improvement. The paper develops a goal-oriented approach that targets predictive reliability specifically below a chosen low threshold t rather than across the whole distribution. It introduces a spatial calibration framework built on occurrence calibration over the full design space and thresholded μ-calibration restricted to sublevel sets where the objective is at most t. Using this framework, the authors construct tcGP, a post-hoc adjustment applied after standard maximum-likelihood training, and prove that the resulting expected-improvement optimizer still visits every region of the space densely.

Core claim

In the noiseless setting, standard Gaussian processes with maximum-likelihood hyperparameters can be post-hoc calibrated below a low threshold t via the tcGP procedure so that their predictive distributions satisfy occurrence calibration over the design space and thresholded μ-calibration on sublevel sets; the expected-improvement acquisition function built on these calibrated distributions produces a global optimization algorithm that remains dense in the design space and yields improved lower-tail calibration together with better optimization performance on standard benchmarks relative to both uncalibrated and globally calibrated Gaussian processes.

What carries the argument

tcGP, the post-hoc calibration procedure that enforces occurrence calibration across the design space and thresholded μ-calibration on sublevel sets of the form {x : f(x) ≤ t}.

If this is right

The expected-improvement optimizer that uses tcGP remains dense in the design space.
Lower-tail calibration of the predictive distributions improves relative to both standard and globally calibrated Gaussian processes.
Bayesian optimization performance improves on standard benchmark functions compared with uncalibrated and globally calibrated models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Selective lower-tail calibration may preserve useful upper-tail behavior that global calibration would alter.
The same spatial-calibration ideas could be tested on acquisition functions other than expected improvement that also depend on lower-tail accuracy.
Because the method works with existing maximum-likelihood fits, it can be inserted into current Gaussian-process Bayesian optimization pipelines without retraining.

Load-bearing premise

The post-hoc adjustment can be applied to ordinary maximum-likelihood Gaussian processes without creating inconsistencies that would destroy the density property of the resulting expected-improvement optimizer.

What would settle it

An experiment in which the expected-improvement algorithm driven by tcGP-calibrated predictions either fails to explore the design space densely or produces worse optimization performance than a standard uncalibrated Gaussian process on a benchmark function.

Figures

Figures reproduced from arXiv: 2605.20145 by Aur\'elien Pion, Emmanuel Vazquez.

**Figure 1.** Figure 1: Comparison between a standard GP and a model calibrated below qδ,n (TCGP), with δ = 0.3, on a standard one-dimensional test function and with n = 10 evaluations. Left: observations with GP predictions; the current design does not include a point near the global minimizer. Middle: probabilities of improvement Fˆ (0) n (mn | x) (GP, black) and Fˆ (1) n (mn | x) (TCGP, red), where mn = mini≤n f(Xi). Right: EI… view at source ↗

**Figure 2.** Figure 2: BO performance and calibration metrics. From left to right: median and 10%/90% quantiles across runs of the estimated excursion probability pmn = P(f(X) ≤ mn) with X ∼ U(X); median twCRPS; median occurrence discrepancy rt; and median tKS–PIT. Calibration metrics are evaluated on a test set at the current best value mn. Results are shown for a standard GP and three TCGP variants using J, tKS–PIT, or rt as t… view at source ↗

**Figure 3.** Figure 3: BO performance summarized by the excursion probability below the current best value. For each run and iteration n, we estimate pmn = P(f(X) ≤ mn) with X ∼ U(X), and report the median and 10%/90% quantiles of pmn across runs for Goldstein–Price, Dixon–Price (d = 4), Rosenbrock (d = 6), and Ackley (d = 4). Methods: GP, BCRGP, onGP, REGP (δ = 0.25), TCGP (δ = 0.05). effect on EI-driven BO than thresholded µ-c… view at source ↗

**Figure 4.** Figure 4: Training time for REGP, GP, TCGP, onGP, BCRGP at each iteration of BO. F.2. Alternative Selection Criteria for TCGP We compare several objectives for selecting (β, λ) in TCGP, with the aim of controlling thresholded µ-calibration and occurrence calibration below t = qδ,n. Rule 0 is the criterion J defined in (18) [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of TCGP variants. Left: median and 10%/90% quantiles of pmn = P(f(X) ≤ mn), where mn is the best observed value after n evaluations and X ∼ µ. Right: fraction of runs reaching the prescribed target value. Rule 2. Following Allen et al. (2025), define ct(u) = Pn U β,λ t ≤ u | f(X) ≤ t κ β,λ t . (90) Then use the KS-type deviation J (2)(β, λ) = sup u∈[0,1] |ct(u) − u| . (91) Rule 3. Compare th… view at source ↗

**Figure 6.** Figure 6: Sensitivity of REGP to δ (results shown after n = 100 BO iterations). Left: median and 10%/90% quantiles of pmn = P(f(X) ≤ mn), where mn is the best value observed after n evaluations and X ∼ µ. Right: fraction of runs reaching the prescribed target level. reaching the prescribed target. Empirically, a large value of δ (e.g., δ = 0.5) consistently leads to degraded performance across the considered problem… view at source ↗

**Figure 7.** Figure 7: Sensitivity of TCGP to δ (results shown after n = 100 BO iterations). Left: median and 10%/90% quantiles of pmn = P(f(X) ≤ mn), where mn is the best value observed after n evaluations and X ∼ µ. Right: fraction of runs reaching the prescribed target level. calibration and sharpness below qn,δ), rt (occurrence calibration at tn), and tKS–PIT (thresholded µ–calibration below tn). The tKS–PIT is computed usin… view at source ↗

**Figure 8.** Figure 8: Comparison of the BO performance with EI as sampling criteria for GP, REGP with δ = 0.25, onGP and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of the BO performance with EI as sampling criteria for GP, REGP with δ = 0.25, onGP and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of the BO performance in moderate dimension d = 10 to d = 20 for 150 iterations with EI as sampling criteria for GP, REGP with δ = 0.25, onGP and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of the BO performance with UCB with ε = 0.1 as sampling criteria for GP, and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of the BO performance with UCB with ε = 0.1 as sampling criteria for GP, and TCGP with δ = 0.05. Left: median and 10%/90% quantile of pmn = P(f(X) ≤ mn), where mn is the best observed value so far. Right: fraction of successful runs reaching a prescribed target level. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

read the original abstract

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $\mu$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper offers a goal-oriented lower-tail calibration for GPs in Bayesian optimization along with a density result for the EI optimizer, though the sequential update of the threshold t raises questions about the proof's applicability.

read the letter

Hi colleague, The punchline is that they have developed tcGP, a post-hoc calibration method for the lower tail of GP predictions below a threshold t, tailored for improving expected improvement in BO. They also show that the resulting optimization procedure remains dense in the design space. What the paper does well is define a spatial calibration framework with occurrence calibration and thresholded mu-calibration on sublevel sets. This is a specific extension for the BO context rather than a generic fix. The experiments demonstrate gains in lower-tail accuracy and overall optimization performance over standard and globally calibrated GPs on benchmarks. Credit to them for focusing on the part of the predictive distribution that actually drives the sampling decisions in minimization. The soft spots are around the theoretical side and the post-hoc nature. The density claim is interesting, but the stress-test concern holds some weight: since t updates to the new incumbent after each evaluation, the calibrated region changes. The abstract says they show the EI-based algorithm remains dense, but without seeing if the argument explicitly handles these data-dependent shifts or just assumes fixed t, it's hard to be fully convinced. Also, applying calibration after MLE on the same data might create some dependence, even if mild. This is for people in the GP and BO community who want to tweak predictive reliability for better exploration. A reader looking for practical improvements with some theory would get value. It deserves a serious referee because the idea is targeted and the experiments back it up, even if more work is needed on the dynamic aspects. I recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes tcGP, a post-hoc goal-oriented calibration method for the lower tail of Gaussian process predictive distributions below a threshold t in the noiseless setting. It introduces a spatial calibration framework with occurrence calibration and thresholded μ-calibration on sublevel sets, shows that the resulting EI-based global optimization algorithm remains dense in the design space, and reports experimental improvements in lower-tail calibration and BO performance over standard GPs and globally calibrated models.

Significance. If the density result holds under sequential updates, the work provides a theoretically supported way to address lower-tail miscalibration in GP-based BO without altering the core model or MLE fitting. The experimental gains on benchmarks suggest practical value for improving exploration-exploitation balance in minimization tasks. The contribution is strengthened by the explicit density claim but tempered by reliance on post-hoc adjustment preserving key properties.

major comments (2)

[Theoretical results on density (likely §4 or §5)] The central density claim for the EI-based optimizer under tcGP (stated in the abstract and presumably proved in the theoretical section) relies on properties of the calibrated predictive distribution. However, the skeptic note indicates that the argument may assume fixed t, while sequential BO updates t to the current incumbent after each observation, dynamically redefining the calibrated sublevel sets. Please specify the section containing the density proof and clarify whether the continuity, strict positivity away from observed points, and uniform continuity arguments are re-verified after each such data-dependent update, or if an extension is provided.
[Framework and tcGP definition (likely §3)] Abstract and introduction claim that tcGP is applied to standard GP models with MLE hyperparameters without introducing inconsistencies that invalidate the density property. However, since calibration parameters are fit post-hoc (potentially to the same data), it is unclear how this preserves the required predictive properties for the density argument. Cite the specific result or lemma that shows the calibrated mean/variance functions remain sufficiently regular for the EI acquisition to satisfy the conditions for density in the design space.

minor comments (2)

[Abstract and §1] The abstract mentions 'noiseless setting' but does not clarify if the framework or experiments extend to noisy observations; if the paper restricts to noiseless, this should be stated more prominently in the introduction.
[Notation section] Notation for the threshold t and sublevel sets {x : f(x) ≤ t} is introduced but could be made more consistent when t is updated to the incumbent value; consider adding a remark on the time-dependence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments on the theoretical foundations of tcGP. We address each major comment below with clarifications drawn directly from the manuscript and commit to revisions that improve the presentation without altering the core claims.

read point-by-point responses

Referee: [Theoretical results on density (likely §4 or §5)] The central density claim for the EI-based optimizer under tcGP (stated in the abstract and presumably proved in the theoretical section) relies on properties of the calibrated predictive distribution. However, the skeptic note indicates that the argument may assume fixed t, while sequential BO updates t to the current incumbent after each observation, dynamically redefining the calibrated sublevel sets. Please specify the section containing the density proof and clarify whether the continuity, strict positivity away from observed points, and uniform continuity arguments are re-verified after each such data-dependent update, or if an extension is provided.

Authors: The density result for the EI-based optimizer is established in Section 4. The proof is stated for a general fixed threshold t and relies on the calibrated predictive mean and variance satisfying continuity, strict positivity of variance away from observed points, and uniform continuity on compact sets. In the sequential BO procedure, t is set to the current incumbent at each iteration and the calibration is re-applied using that t; because the calibration map is continuous in t and the underlying GP kernel properties are unchanged, the same regularity conditions hold at every step. We will insert a short remark in Section 4 explicitly noting that the argument applies iteratively with the data-dependent t. revision: partial
Referee: [Framework and tcGP definition (likely §3)] Abstract and introduction claim that tcGP is applied to standard GP models with MLE hyperparameters without introducing inconsistencies that invalidate the density property. However, since calibration parameters are fit post-hoc (potentially to the same data), it is unclear how this preserves the required predictive properties for the density argument. Cite the specific result or lemma that shows the calibrated mean/variance functions remain sufficiently regular for the EI acquisition to satisfy the conditions for density in the design space.

Authors: Section 3 defines tcGP via occurrence calibration and thresholded μ-calibration applied only on the sublevel set {x : f(x) ≤ t}. The post-hoc adjustment is constructed so that the calibrated mean remains continuous and the calibrated variance remains strictly positive away from observed points whenever the original GP satisfies these properties (which it does under standard MLE fitting with a continuous kernel). This preservation is stated as part of the framework in Section 3 and is used directly to invoke the density theorem in Section 4. We will add an explicit forward reference to the relevant paragraph in Section 3 both in the abstract and at the end of the introduction. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation of tcGP density property

full rationale

The paper defines a spatial calibration framework (occurrence calibration and thresholded μ-calibration on sublevel sets) and applies post-hoc tcGP calibration to standard MLE-fitted GPs in the noiseless case. The claim that the resulting EI optimizer remains dense follows from continuity and positivity properties of the calibrated predictive distributions on compact sets, without reducing the density result to a fitted parameter renamed as prediction or to a self-citation chain. No equations equate the target density statement to its inputs by construction, and the framework is presented as an independent extension rather than an ansatz smuggled via prior self-work. The derivation is self-contained against the introduced calibration notions and standard GP-EI arguments.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard GP assumptions plus the new calibration framework; one domain assumption and limited free parameters for the post-hoc step.

free parameters (1)

lower-tail calibration parameters
Post-hoc adjustment parameters chosen to achieve occurrence and thresholded mu-calibration below t.

axioms (1)

domain assumption Noiseless observations and standard GP with MLE hyperparameters
Explicitly stated as the setting for the study and method.

pith-pipeline@v0.9.0 · 5740 in / 1226 out tokens · 37413 ms · 2026-05-20T03:07:28.522596+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We model Rn(X,f(X)) with a generalized normal (GN) family... J(β,λ) = sup |Pn(Uβ,λt ≤ u | f(X)≤t) − u κβ,λt|
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 5.3... (Xn) is dense in X (NEB property, fixed hyperparameters, compact parameter box for (βn,λn))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events

Allen, S., Bhend, J., Martius, O., and Ziegel, J. Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events. Weather and Forecasting, 38 0 (3): 0 499 -- 516, 2023

work page 2023
[2]

Tail calibration of probabilistic forecasts

Allen, S., Koh, J., Segers, J., and Ziegel, J. Tail calibration of probabilistic forecasts. J. Amer. Statist. Assoc., 120 0 (552): 0 2796--2808, 2025

work page 2025
[3]

Finite-time analysis of the multiarmed bandit problem

Auer, P., Cesa - Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47: 0 235--256, 2002

work page 2002
[4]

B ayesian subset simulation

Bect, J., Li, L., and Vazquez, E. B ayesian subset simulation. SIAM/ASA Journal on Uncertainty Quantification, 5 0 (1): 0 762--786, 2017

work page 2017
[5]

and Krause, A

Bogunovic, I. and Krause, A. Misspecified G aussian process bandit optimization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 3004--3015. Curran Associates, Inc., 2021

work page 2021
[6]

Bull, A. D. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res., 12 0 (88): 0 2879--2904, 2011

work page 2011
[7]

and Ghahramani, Z

Chu, W. and Ghahramani, Z. G aussian processes for ordinal regression. J. Mach. Learn. Res., 6 0 (35): 0 1019--1041, 2005

work page 2005
[8]

Online calibrated and conformal prediction improves B ayesian optimization

Deshpande, S., Marx, C., and Kuleshov, V. Online calibrated and conformal prediction improves B ayesian optimization. In Dasgupta, S., Mandt, S., and Li, Y. (eds.), Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proc. Mach. Learn. Res., pp.\ 1450--1458. PMLR, 02--04 May 2024

work page 2024
[9]

A B ayesian approach to constrained single- and multi-objective optimization

Feliot, P., Bect, J., and Vazquez, E. A B ayesian approach to constrained single- and multi-objective optimization. J. Global Optim., 67 0 (1-2): 0 97--133, April 2016

work page 2016
[10]

Forrester, A. I. J., S \'o bester, A., and Keane, A. J. Engineering Design via Surrogate Modelling: A Practical Guide. John Wiley & Sons, 2008

work page 2008
[11]

CRPS -based targeted sequential design with application in chemical space, 2025

Friedli, L., Gautier, A., Broccard, A., and Ginsbourger, D. CRPS -based targeted sequential design with application in chemical space, 2025. URL https://arxiv.org/abs/2503.11250

work page arXiv 2025
[12]

and Resin, J

Gneiting, T. and Resin, J. Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination. Electron. J. Stat., 17 0 (2), January 2023

work page 2023
[13]

Gneiting, T., Balabdaoui, F., and Raftery, A. E. Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol., 69 0 (2): 0 243--268, 2007

work page 2007
[14]

Gneiting, T. G. and Raftery, A. E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 0 (477): 0 359--378, 2007

work page 2007
[15]

S., and Liu, H

Guo, Z., Ong, Y. S., and Liu, H. Calibrated and recalibrated expected improvements for B ayesian optimization. Struct. Multidiscip. Optim., 64: 0 3549--3567, 2021

work page 2021
[16]

and Ren, Z

Jin, Y. and Ren, Z. Confidence on the focal: conformal prediction with selection-conditional coverage. J. R. Stat. Soc. Ser. B. Stat. Methodol., 87 0 (4): 0 1239--1259, 04 2025

work page 2025
[17]

Efficient global optimization of expensive black-box functions

Jones, D., Schonlau, M., and Welch, W. Efficient global optimization of expensive black-box functions. J. Global Optim., 13: 0 455--492, 12 1998

work page 1998
[18]

Robust B ayesian optimization via localized online conformal prediction

Kim, D., Zecchin, M., Park, S., Kang, J., and Simeone, O. Robust B ayesian optimization via localized online conformal prediction. IEEE Trans. Signal Process., 73: 0 2039--2052, 2025

work page 2039
[19]

Lai, T. L. and Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math., 6 0 (1): 0 4--22, 1985

work page 1985
[20]

Matheson, J. E. and Winkler, R. L. Scoring rules for continuous probability distributions. Manag. Sci., 22 0 (10): 0 1087--1096, 1976

work page 1976
[21]

Q., Hahn, G

Meeker, W. Q., Hahn, G. J., and Escobar, L. A. Statistical Intervals: A Guide for Practitioners and Researchers. John Wiley & Sons, Hoboken, New Jersey, second edition, 2017. ISBN 978-0-471-68717-7

work page 2017
[22]

The application of B ayesian methods for seeking the extremum

Mockus, J., Tiesis, V., and Zilinskas, A. The application of B ayesian methods for seeking the extremum. In Dixon, L. C. W. and Szeg \"o , G. P. (eds.), Towards Global Optimisation, volume 2, pp.\ 117--129. North-Holland, Amsterdam, 1978

work page 1978
[23]

A generalized normal distribution

Nadarajah, S. A generalized normal distribution. J. Appl. Stat., 32 0 (7): 0 685--694, 2005

work page 2005
[24]

and Ramdas, A

Neiswanger, W. and Ramdas, A. Uncertainty quantification using martingales for misspecified G aussian processes. In Feldman, V., Ligett, K., and Sabato, S. (eds.), Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proc. Mach. Learn. Res., pp.\ 963--982. PMLR, 16--19 Mar 2021

work page 2021
[25]

Nelder, J. A. and Mead, R. A simplex method for function minimization. The Computer Journal, 7 0 (4): 0 308--313, 01 1965

work page 1965
[26]

J., Bect, J., and Vazquez, E

Petit, S. J., Bect, J., and Vazquez, E. Relaxed G aussian process interpolation: a goal-oriented approach to B ayesian optimization. Journal of Machine Learning Research, 26 0 (195): 0 1--70, 2025

work page 2025
[27]

Ordinal B ayesian optimisation, 2019

Picheny, V., Vakili, S., and Artemev, A. Ordinal B ayesian optimisation, 2019. URL https://arxiv.org/abs/1912.02493

work page arXiv 2019
[28]

B ayesian quantile and expectile optimisation

Picheny, V., Moss, H., Torossian, L., and Durrande, N. B ayesian quantile and expectile optimisation. In Cussens, J. and Zhang, K. (eds.), Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proc. Mach. Learn. Res., pp.\ 1623--1633. PMLR, 01--05 Aug 2022

work page 2022
[29]

and Vazquez, E

Pion, A. and Vazquez, E. Design-marginal calibration of G aussian process predictive distributions: B ayesian and conformal approaches, 2025. URL https://arxiv.org/abs/2512.05611

work page arXiv 2025
[30]

Reliable decisions with threshold calibration

Sahoo, R., Zhao, S., Chen, A., and Ermon, S. Reliable decisions with threshold calibration. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 1831--1844. Curran Associates, Inc., 2021

work page 2021
[31]

and Welch, W

Schonlau, M. and Welch, W. J. Global optimization with nonparametric function fitting. In Proceedings of the ASA, Section on Physical and Engineering Sciences, pp.\ 183--186. Amer. Statist. Assoc., 1996

work page 1996
[32]

SciPy Project , 2026

SciPy Developers . SciPy Project , 2026. URL https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gennorm.html. SciPy API Reference, accessed 2026-01

work page 2026
[33]

Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York, 1992

work page 1992
[34]

Improving predictive inference under covariate shift by weighting the log-likelihood function

Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Plann. Inference, 90 0 (2): 0 227--244, 2000

work page 2000
[35]

M., and Seeger, M

Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. G aussian process optimization in the bandit setting: No regret and experimental design. In Proc. 27th International Conference on Machine Learning (ICML 2010), pp.\ 1015--1022, 2010

work page 2010
[36]

Stanton, S., Maddox, W., and Wilson, A. G. B ayesian optimization with conformal prediction sets. In Ruiz, F., Dy, J., and van de Meent, J. W. (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proc. Mach. Learn. Res., pp.\ 959--986. PMLR, 25--27 Apr 2023

work page 2023
[37]

Stein, M. L. Interpolation of Spatial Data: Some Theory for Kriging. Springer Ser. Stat. Springer New York, 1999

work page 1999
[38]

and Bingham, D

Surjanovic, S. and Bingham, D. Virtual library of simulation experiments: Test functions and datasets, 2013. URL https://www.sfu.ca/ ssurjano/. Accessed November 2025

work page 2013
[39]

Ranking over regression for B ayesian optimization and molecule selection

Tom, G., Lo, S., Corapi, S., Aspuru-Guzik, A., and Sanchez-Lengeling, B. Ranking over regression for B ayesian optimization and molecule selection. APL Machine Learning, 3 0 (3): 0 036113, 08 2025

work page 2025
[40]

and Wang, W

Tuo, R. and Wang, W. Uncertainty quantification for B ayesian optimization. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151 of Proc. Mach. Learn. Res., pp.\ 2862--2884. PMLR, 2022

work page 2022
[41]

Gpmp: the G aussian process micro package, 2026

Vazquez, E. Gpmp: the G aussian process micro package, 2026. URL https://github.com/gpmp-dev/gpmp

work page 2026
[42]

and Bect, J

Vazquez, E. and Bect, J. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plann. Inference, 140: 0 3088--3095, 11 2010

work page 2010
[43]

An informational approach to the global optimization of expensive-to-evaluate functions

Villemonteix, J., Vazquez, E., and Walter, E. An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim., 44 0 (4): 0 509--534, 2009

work page 2009
[44]

Algorithmic Learning in a Random World

Vovk, V., Gammerman, A., and Shafer, G. Algorithmic Learning in a Random World. Springer, 2005

work page 2005
[45]

Nonparametric predictive distributions based on conformal prediction

Vovk, V., Shen, J., Manokhin, V., and Xie, M. Nonparametric predictive distributions based on conformal prediction. Mach. Learn., 108 0 (3): 0 445--474, 2019

work page 2019
[46]

and Candès, E

Zhang, Y. and Candès, E. J. Posterior conformal prediction, 2024. URL https://arxiv.org/abs/2409.19712

work page arXiv 2024

[1] [1]

Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events

Allen, S., Bhend, J., Martius, O., and Ziegel, J. Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events. Weather and Forecasting, 38 0 (3): 0 499 -- 516, 2023

work page 2023

[2] [2]

Tail calibration of probabilistic forecasts

Allen, S., Koh, J., Segers, J., and Ziegel, J. Tail calibration of probabilistic forecasts. J. Amer. Statist. Assoc., 120 0 (552): 0 2796--2808, 2025

work page 2025

[3] [3]

Finite-time analysis of the multiarmed bandit problem

Auer, P., Cesa - Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47: 0 235--256, 2002

work page 2002

[4] [4]

B ayesian subset simulation

Bect, J., Li, L., and Vazquez, E. B ayesian subset simulation. SIAM/ASA Journal on Uncertainty Quantification, 5 0 (1): 0 762--786, 2017

work page 2017

[5] [5]

and Krause, A

Bogunovic, I. and Krause, A. Misspecified G aussian process bandit optimization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 3004--3015. Curran Associates, Inc., 2021

work page 2021

[6] [6]

Bull, A. D. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res., 12 0 (88): 0 2879--2904, 2011

work page 2011

[7] [7]

and Ghahramani, Z

Chu, W. and Ghahramani, Z. G aussian processes for ordinal regression. J. Mach. Learn. Res., 6 0 (35): 0 1019--1041, 2005

work page 2005

[8] [8]

Online calibrated and conformal prediction improves B ayesian optimization

Deshpande, S., Marx, C., and Kuleshov, V. Online calibrated and conformal prediction improves B ayesian optimization. In Dasgupta, S., Mandt, S., and Li, Y. (eds.), Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proc. Mach. Learn. Res., pp.\ 1450--1458. PMLR, 02--04 May 2024

work page 2024

[9] [9]

A B ayesian approach to constrained single- and multi-objective optimization

Feliot, P., Bect, J., and Vazquez, E. A B ayesian approach to constrained single- and multi-objective optimization. J. Global Optim., 67 0 (1-2): 0 97--133, April 2016

work page 2016

[10] [10]

Forrester, A. I. J., S \'o bester, A., and Keane, A. J. Engineering Design via Surrogate Modelling: A Practical Guide. John Wiley & Sons, 2008

work page 2008

[11] [11]

CRPS -based targeted sequential design with application in chemical space, 2025

Friedli, L., Gautier, A., Broccard, A., and Ginsbourger, D. CRPS -based targeted sequential design with application in chemical space, 2025. URL https://arxiv.org/abs/2503.11250

work page arXiv 2025

[12] [12]

and Resin, J

Gneiting, T. and Resin, J. Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination. Electron. J. Stat., 17 0 (2), January 2023

work page 2023

[13] [13]

Gneiting, T., Balabdaoui, F., and Raftery, A. E. Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol., 69 0 (2): 0 243--268, 2007

work page 2007

[14] [14]

Gneiting, T. G. and Raftery, A. E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 0 (477): 0 359--378, 2007

work page 2007

[15] [15]

S., and Liu, H

Guo, Z., Ong, Y. S., and Liu, H. Calibrated and recalibrated expected improvements for B ayesian optimization. Struct. Multidiscip. Optim., 64: 0 3549--3567, 2021

work page 2021

[16] [16]

and Ren, Z

Jin, Y. and Ren, Z. Confidence on the focal: conformal prediction with selection-conditional coverage. J. R. Stat. Soc. Ser. B. Stat. Methodol., 87 0 (4): 0 1239--1259, 04 2025

work page 2025

[17] [17]

Efficient global optimization of expensive black-box functions

Jones, D., Schonlau, M., and Welch, W. Efficient global optimization of expensive black-box functions. J. Global Optim., 13: 0 455--492, 12 1998

work page 1998

[18] [18]

Robust B ayesian optimization via localized online conformal prediction

Kim, D., Zecchin, M., Park, S., Kang, J., and Simeone, O. Robust B ayesian optimization via localized online conformal prediction. IEEE Trans. Signal Process., 73: 0 2039--2052, 2025

work page 2039

[19] [19]

Lai, T. L. and Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math., 6 0 (1): 0 4--22, 1985

work page 1985

[20] [20]

Matheson, J. E. and Winkler, R. L. Scoring rules for continuous probability distributions. Manag. Sci., 22 0 (10): 0 1087--1096, 1976

work page 1976

[21] [21]

Q., Hahn, G

Meeker, W. Q., Hahn, G. J., and Escobar, L. A. Statistical Intervals: A Guide for Practitioners and Researchers. John Wiley & Sons, Hoboken, New Jersey, second edition, 2017. ISBN 978-0-471-68717-7

work page 2017

[22] [22]

The application of B ayesian methods for seeking the extremum

Mockus, J., Tiesis, V., and Zilinskas, A. The application of B ayesian methods for seeking the extremum. In Dixon, L. C. W. and Szeg \"o , G. P. (eds.), Towards Global Optimisation, volume 2, pp.\ 117--129. North-Holland, Amsterdam, 1978

work page 1978

[23] [23]

A generalized normal distribution

Nadarajah, S. A generalized normal distribution. J. Appl. Stat., 32 0 (7): 0 685--694, 2005

work page 2005

[24] [24]

and Ramdas, A

Neiswanger, W. and Ramdas, A. Uncertainty quantification using martingales for misspecified G aussian processes. In Feldman, V., Ligett, K., and Sabato, S. (eds.), Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proc. Mach. Learn. Res., pp.\ 963--982. PMLR, 16--19 Mar 2021

work page 2021

[25] [25]

Nelder, J. A. and Mead, R. A simplex method for function minimization. The Computer Journal, 7 0 (4): 0 308--313, 01 1965

work page 1965

[26] [26]

J., Bect, J., and Vazquez, E

Petit, S. J., Bect, J., and Vazquez, E. Relaxed G aussian process interpolation: a goal-oriented approach to B ayesian optimization. Journal of Machine Learning Research, 26 0 (195): 0 1--70, 2025

work page 2025

[27] [27]

Ordinal B ayesian optimisation, 2019

Picheny, V., Vakili, S., and Artemev, A. Ordinal B ayesian optimisation, 2019. URL https://arxiv.org/abs/1912.02493

work page arXiv 2019

[28] [28]

B ayesian quantile and expectile optimisation

Picheny, V., Moss, H., Torossian, L., and Durrande, N. B ayesian quantile and expectile optimisation. In Cussens, J. and Zhang, K. (eds.), Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proc. Mach. Learn. Res., pp.\ 1623--1633. PMLR, 01--05 Aug 2022

work page 2022

[29] [29]

and Vazquez, E

Pion, A. and Vazquez, E. Design-marginal calibration of G aussian process predictive distributions: B ayesian and conformal approaches, 2025. URL https://arxiv.org/abs/2512.05611

work page arXiv 2025

[30] [30]

Reliable decisions with threshold calibration

Sahoo, R., Zhao, S., Chen, A., and Ermon, S. Reliable decisions with threshold calibration. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 1831--1844. Curran Associates, Inc., 2021

work page 2021

[31] [31]

and Welch, W

Schonlau, M. and Welch, W. J. Global optimization with nonparametric function fitting. In Proceedings of the ASA, Section on Physical and Engineering Sciences, pp.\ 183--186. Amer. Statist. Assoc., 1996

work page 1996

[32] [32]

SciPy Project , 2026

SciPy Developers . SciPy Project , 2026. URL https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gennorm.html. SciPy API Reference, accessed 2026-01

work page 2026

[33] [33]

Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York, 1992

work page 1992

[34] [34]

Improving predictive inference under covariate shift by weighting the log-likelihood function

Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Plann. Inference, 90 0 (2): 0 227--244, 2000

work page 2000

[35] [35]

M., and Seeger, M

Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. G aussian process optimization in the bandit setting: No regret and experimental design. In Proc. 27th International Conference on Machine Learning (ICML 2010), pp.\ 1015--1022, 2010

work page 2010

[36] [36]

Stanton, S., Maddox, W., and Wilson, A. G. B ayesian optimization with conformal prediction sets. In Ruiz, F., Dy, J., and van de Meent, J. W. (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proc. Mach. Learn. Res., pp.\ 959--986. PMLR, 25--27 Apr 2023

work page 2023

[37] [37]

Stein, M. L. Interpolation of Spatial Data: Some Theory for Kriging. Springer Ser. Stat. Springer New York, 1999

work page 1999

[38] [38]

and Bingham, D

Surjanovic, S. and Bingham, D. Virtual library of simulation experiments: Test functions and datasets, 2013. URL https://www.sfu.ca/ ssurjano/. Accessed November 2025

work page 2013

[39] [39]

Ranking over regression for B ayesian optimization and molecule selection

Tom, G., Lo, S., Corapi, S., Aspuru-Guzik, A., and Sanchez-Lengeling, B. Ranking over regression for B ayesian optimization and molecule selection. APL Machine Learning, 3 0 (3): 0 036113, 08 2025

work page 2025

[40] [40]

and Wang, W

Tuo, R. and Wang, W. Uncertainty quantification for B ayesian optimization. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151 of Proc. Mach. Learn. Res., pp.\ 2862--2884. PMLR, 2022

work page 2022

[41] [41]

Gpmp: the G aussian process micro package, 2026

Vazquez, E. Gpmp: the G aussian process micro package, 2026. URL https://github.com/gpmp-dev/gpmp

work page 2026

[42] [42]

and Bect, J

Vazquez, E. and Bect, J. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plann. Inference, 140: 0 3088--3095, 11 2010

work page 2010

[43] [43]

An informational approach to the global optimization of expensive-to-evaluate functions

Villemonteix, J., Vazquez, E., and Walter, E. An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim., 44 0 (4): 0 509--534, 2009

work page 2009

[44] [44]

Algorithmic Learning in a Random World

Vovk, V., Gammerman, A., and Shafer, G. Algorithmic Learning in a Random World. Springer, 2005

work page 2005

[45] [45]

Nonparametric predictive distributions based on conformal prediction

Vovk, V., Shen, J., Manokhin, V., and Xie, M. Nonparametric predictive distributions based on conformal prediction. Mach. Learn., 108 0 (3): 0 445--474, 2019

work page 2019

[46] [46]

and Candès, E

Zhang, Y. and Candès, E. J. Posterior conformal prediction, 2024. URL https://arxiv.org/abs/2409.19712

work page arXiv 2024