Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization
Pith reviewed 2026-05-20 03:07 UTC · model grok-4.3
The pith
A post-hoc method calibrates Gaussian process lower tails below a threshold for Bayesian optimization while keeping the search algorithm dense in the design space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the noiseless setting, standard Gaussian processes with maximum-likelihood hyperparameters can be post-hoc calibrated below a low threshold t via the tcGP procedure so that their predictive distributions satisfy occurrence calibration over the design space and thresholded μ-calibration on sublevel sets; the expected-improvement acquisition function built on these calibrated distributions produces a global optimization algorithm that remains dense in the design space and yields improved lower-tail calibration together with better optimization performance on standard benchmarks relative to both uncalibrated and globally calibrated Gaussian processes.
What carries the argument
tcGP, the post-hoc calibration procedure that enforces occurrence calibration across the design space and thresholded μ-calibration on sublevel sets of the form {x : f(x) ≤ t}.
If this is right
- The expected-improvement optimizer that uses tcGP remains dense in the design space.
- Lower-tail calibration of the predictive distributions improves relative to both standard and globally calibrated Gaussian processes.
- Bayesian optimization performance improves on standard benchmark functions compared with uncalibrated and globally calibrated models.
Where Pith is reading between the lines
- Selective lower-tail calibration may preserve useful upper-tail behavior that global calibration would alter.
- The same spatial-calibration ideas could be tested on acquisition functions other than expected improvement that also depend on lower-tail accuracy.
- Because the method works with existing maximum-likelihood fits, it can be inserted into current Gaussian-process Bayesian optimization pipelines without retraining.
Load-bearing premise
The post-hoc adjustment can be applied to ordinary maximum-likelihood Gaussian processes without creating inconsistencies that would destroy the density property of the resulting expected-improvement optimizer.
What would settle it
An experiment in which the expected-improvement algorithm driven by tcGP-calibrated predictions either fails to explore the design space densely or produces worse optimization performance than a standard uncalibrated Gaussian process on a benchmark function.
Figures
read the original abstract
Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $\mu$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes tcGP, a post-hoc goal-oriented calibration method for the lower tail of Gaussian process predictive distributions below a threshold t in the noiseless setting. It introduces a spatial calibration framework with occurrence calibration and thresholded μ-calibration on sublevel sets, shows that the resulting EI-based global optimization algorithm remains dense in the design space, and reports experimental improvements in lower-tail calibration and BO performance over standard GPs and globally calibrated models.
Significance. If the density result holds under sequential updates, the work provides a theoretically supported way to address lower-tail miscalibration in GP-based BO without altering the core model or MLE fitting. The experimental gains on benchmarks suggest practical value for improving exploration-exploitation balance in minimization tasks. The contribution is strengthened by the explicit density claim but tempered by reliance on post-hoc adjustment preserving key properties.
major comments (2)
- [Theoretical results on density (likely §4 or §5)] The central density claim for the EI-based optimizer under tcGP (stated in the abstract and presumably proved in the theoretical section) relies on properties of the calibrated predictive distribution. However, the skeptic note indicates that the argument may assume fixed t, while sequential BO updates t to the current incumbent after each observation, dynamically redefining the calibrated sublevel sets. Please specify the section containing the density proof and clarify whether the continuity, strict positivity away from observed points, and uniform continuity arguments are re-verified after each such data-dependent update, or if an extension is provided.
- [Framework and tcGP definition (likely §3)] Abstract and introduction claim that tcGP is applied to standard GP models with MLE hyperparameters without introducing inconsistencies that invalidate the density property. However, since calibration parameters are fit post-hoc (potentially to the same data), it is unclear how this preserves the required predictive properties for the density argument. Cite the specific result or lemma that shows the calibrated mean/variance functions remain sufficiently regular for the EI acquisition to satisfy the conditions for density in the design space.
minor comments (2)
- [Abstract and §1] The abstract mentions 'noiseless setting' but does not clarify if the framework or experiments extend to noisy observations; if the paper restricts to noiseless, this should be stated more prominently in the introduction.
- [Notation section] Notation for the threshold t and sublevel sets {x : f(x) ≤ t} is introduced but could be made more consistent when t is updated to the incumbent value; consider adding a remark on the time-dependence.
Simulated Author's Rebuttal
We thank the referee for the careful reading and insightful comments on the theoretical foundations of tcGP. We address each major comment below with clarifications drawn directly from the manuscript and commit to revisions that improve the presentation without altering the core claims.
read point-by-point responses
-
Referee: [Theoretical results on density (likely §4 or §5)] The central density claim for the EI-based optimizer under tcGP (stated in the abstract and presumably proved in the theoretical section) relies on properties of the calibrated predictive distribution. However, the skeptic note indicates that the argument may assume fixed t, while sequential BO updates t to the current incumbent after each observation, dynamically redefining the calibrated sublevel sets. Please specify the section containing the density proof and clarify whether the continuity, strict positivity away from observed points, and uniform continuity arguments are re-verified after each such data-dependent update, or if an extension is provided.
Authors: The density result for the EI-based optimizer is established in Section 4. The proof is stated for a general fixed threshold t and relies on the calibrated predictive mean and variance satisfying continuity, strict positivity of variance away from observed points, and uniform continuity on compact sets. In the sequential BO procedure, t is set to the current incumbent at each iteration and the calibration is re-applied using that t; because the calibration map is continuous in t and the underlying GP kernel properties are unchanged, the same regularity conditions hold at every step. We will insert a short remark in Section 4 explicitly noting that the argument applies iteratively with the data-dependent t. revision: partial
-
Referee: [Framework and tcGP definition (likely §3)] Abstract and introduction claim that tcGP is applied to standard GP models with MLE hyperparameters without introducing inconsistencies that invalidate the density property. However, since calibration parameters are fit post-hoc (potentially to the same data), it is unclear how this preserves the required predictive properties for the density argument. Cite the specific result or lemma that shows the calibrated mean/variance functions remain sufficiently regular for the EI acquisition to satisfy the conditions for density in the design space.
Authors: Section 3 defines tcGP via occurrence calibration and thresholded μ-calibration applied only on the sublevel set {x : f(x) ≤ t}. The post-hoc adjustment is constructed so that the calibrated mean remains continuous and the calibrated variance remains strictly positive away from observed points whenever the original GP satisfies these properties (which it does under standard MLE fitting with a continuous kernel). This preservation is stated as part of the framework in Section 3 and is used directly to invoke the density theorem in Section 4. We will add an explicit forward reference to the relevant paragraph in Section 3 both in the abstract and at the end of the introduction. revision: yes
Circularity Check
No significant circularity in derivation of tcGP density property
full rationale
The paper defines a spatial calibration framework (occurrence calibration and thresholded μ-calibration on sublevel sets) and applies post-hoc tcGP calibration to standard MLE-fitted GPs in the noiseless case. The claim that the resulting EI optimizer remains dense follows from continuity and positivity properties of the calibrated predictive distributions on compact sets, without reducing the density result to a fitted parameter renamed as prediction or to a self-citation chain. No equations equate the target density statement to its inputs by construction, and the framework is presented as an independent extension rather than an ansatz smuggled via prior self-work. The derivation is self-contained against the introduced calibration notions and standard GP-EI arguments.
Axiom & Free-Parameter Ledger
free parameters (1)
- lower-tail calibration parameters
axioms (1)
- domain assumption Noiseless observations and standard GP with MLE hyperparameters
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We model Rn(X,f(X)) with a generalized normal (GN) family... J(β,λ) = sup |Pn(Uβ,λt ≤ u | f(X)≤t) − u κβ,λt|
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Proposition 5.3... (Xn) is dense in X (NEB property, fixed hyperparameters, compact parameter box for (βn,λn))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Allen, S., Bhend, J., Martius, O., and Ziegel, J. Weighted verification tools to evaluate univariate and multivariate probabilistic forecasts for high-impact weather events. Weather and Forecasting, 38 0 (3): 0 499 -- 516, 2023
work page 2023
-
[2]
Tail calibration of probabilistic forecasts
Allen, S., Koh, J., Segers, J., and Ziegel, J. Tail calibration of probabilistic forecasts. J. Amer. Statist. Assoc., 120 0 (552): 0 2796--2808, 2025
work page 2025
-
[3]
Finite-time analysis of the multiarmed bandit problem
Auer, P., Cesa - Bianchi, N., and Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47: 0 235--256, 2002
work page 2002
-
[4]
Bect, J., Li, L., and Vazquez, E. B ayesian subset simulation. SIAM/ASA Journal on Uncertainty Quantification, 5 0 (1): 0 762--786, 2017
work page 2017
-
[5]
Bogunovic, I. and Krause, A. Misspecified G aussian process bandit optimization. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 3004--3015. Curran Associates, Inc., 2021
work page 2021
-
[6]
Bull, A. D. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res., 12 0 (88): 0 2879--2904, 2011
work page 2011
-
[7]
Chu, W. and Ghahramani, Z. G aussian processes for ordinal regression. J. Mach. Learn. Res., 6 0 (35): 0 1019--1041, 2005
work page 2005
-
[8]
Online calibrated and conformal prediction improves B ayesian optimization
Deshpande, S., Marx, C., and Kuleshov, V. Online calibrated and conformal prediction improves B ayesian optimization. In Dasgupta, S., Mandt, S., and Li, Y. (eds.), Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proc. Mach. Learn. Res., pp.\ 1450--1458. PMLR, 02--04 May 2024
work page 2024
-
[9]
A B ayesian approach to constrained single- and multi-objective optimization
Feliot, P., Bect, J., and Vazquez, E. A B ayesian approach to constrained single- and multi-objective optimization. J. Global Optim., 67 0 (1-2): 0 97--133, April 2016
work page 2016
-
[10]
Forrester, A. I. J., S \'o bester, A., and Keane, A. J. Engineering Design via Surrogate Modelling: A Practical Guide. John Wiley & Sons, 2008
work page 2008
-
[11]
CRPS -based targeted sequential design with application in chemical space, 2025
Friedli, L., Gautier, A., Broccard, A., and Ginsbourger, D. CRPS -based targeted sequential design with application in chemical space, 2025. URL https://arxiv.org/abs/2503.11250
-
[12]
Gneiting, T. and Resin, J. Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination. Electron. J. Stat., 17 0 (2), January 2023
work page 2023
-
[13]
Gneiting, T., Balabdaoui, F., and Raftery, A. E. Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol., 69 0 (2): 0 243--268, 2007
work page 2007
-
[14]
Gneiting, T. G. and Raftery, A. E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc., 102 0 (477): 0 359--378, 2007
work page 2007
-
[15]
Guo, Z., Ong, Y. S., and Liu, H. Calibrated and recalibrated expected improvements for B ayesian optimization. Struct. Multidiscip. Optim., 64: 0 3549--3567, 2021
work page 2021
-
[16]
Jin, Y. and Ren, Z. Confidence on the focal: conformal prediction with selection-conditional coverage. J. R. Stat. Soc. Ser. B. Stat. Methodol., 87 0 (4): 0 1239--1259, 04 2025
work page 2025
-
[17]
Efficient global optimization of expensive black-box functions
Jones, D., Schonlau, M., and Welch, W. Efficient global optimization of expensive black-box functions. J. Global Optim., 13: 0 455--492, 12 1998
work page 1998
-
[18]
Robust B ayesian optimization via localized online conformal prediction
Kim, D., Zecchin, M., Park, S., Kang, J., and Simeone, O. Robust B ayesian optimization via localized online conformal prediction. IEEE Trans. Signal Process., 73: 0 2039--2052, 2025
work page 2039
-
[19]
Lai, T. L. and Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math., 6 0 (1): 0 4--22, 1985
work page 1985
-
[20]
Matheson, J. E. and Winkler, R. L. Scoring rules for continuous probability distributions. Manag. Sci., 22 0 (10): 0 1087--1096, 1976
work page 1976
-
[21]
Meeker, W. Q., Hahn, G. J., and Escobar, L. A. Statistical Intervals: A Guide for Practitioners and Researchers. John Wiley & Sons, Hoboken, New Jersey, second edition, 2017. ISBN 978-0-471-68717-7
work page 2017
-
[22]
The application of B ayesian methods for seeking the extremum
Mockus, J., Tiesis, V., and Zilinskas, A. The application of B ayesian methods for seeking the extremum. In Dixon, L. C. W. and Szeg \"o , G. P. (eds.), Towards Global Optimisation, volume 2, pp.\ 117--129. North-Holland, Amsterdam, 1978
work page 1978
-
[23]
A generalized normal distribution
Nadarajah, S. A generalized normal distribution. J. Appl. Stat., 32 0 (7): 0 685--694, 2005
work page 2005
-
[24]
Neiswanger, W. and Ramdas, A. Uncertainty quantification using martingales for misspecified G aussian processes. In Feldman, V., Ligett, K., and Sabato, S. (eds.), Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proc. Mach. Learn. Res., pp.\ 963--982. PMLR, 16--19 Mar 2021
work page 2021
-
[25]
Nelder, J. A. and Mead, R. A simplex method for function minimization. The Computer Journal, 7 0 (4): 0 308--313, 01 1965
work page 1965
-
[26]
Petit, S. J., Bect, J., and Vazquez, E. Relaxed G aussian process interpolation: a goal-oriented approach to B ayesian optimization. Journal of Machine Learning Research, 26 0 (195): 0 1--70, 2025
work page 2025
-
[27]
Ordinal B ayesian optimisation, 2019
Picheny, V., Vakili, S., and Artemev, A. Ordinal B ayesian optimisation, 2019. URL https://arxiv.org/abs/1912.02493
-
[28]
B ayesian quantile and expectile optimisation
Picheny, V., Moss, H., Torossian, L., and Durrande, N. B ayesian quantile and expectile optimisation. In Cussens, J. and Zhang, K. (eds.), Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proc. Mach. Learn. Res., pp.\ 1623--1633. PMLR, 01--05 Aug 2022
work page 2022
-
[29]
Pion, A. and Vazquez, E. Design-marginal calibration of G aussian process predictive distributions: B ayesian and conformal approaches, 2025. URL https://arxiv.org/abs/2512.05611
-
[30]
Reliable decisions with threshold calibration
Sahoo, R., Zhao, S., Chen, A., and Ermon, S. Reliable decisions with threshold calibration. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Adv. Neural Inf. Process. Syst., volume 34, pp.\ 1831--1844. Curran Associates, Inc., 2021
work page 2021
-
[31]
Schonlau, M. and Welch, W. J. Global optimization with nonparametric function fitting. In Proceedings of the ASA, Section on Physical and Engineering Sciences, pp.\ 183--186. Amer. Statist. Assoc., 1996
work page 1996
-
[32]
SciPy Developers . SciPy Project , 2026. URL https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gennorm.html. SciPy API Reference, accessed 2026-01
work page 2026
-
[33]
Scott, D. W. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York, 1992
work page 1992
-
[34]
Improving predictive inference under covariate shift by weighting the log-likelihood function
Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Statist. Plann. Inference, 90 0 (2): 0 227--244, 2000
work page 2000
-
[35]
Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M. G aussian process optimization in the bandit setting: No regret and experimental design. In Proc. 27th International Conference on Machine Learning (ICML 2010), pp.\ 1015--1022, 2010
work page 2010
-
[36]
Stanton, S., Maddox, W., and Wilson, A. G. B ayesian optimization with conformal prediction sets. In Ruiz, F., Dy, J., and van de Meent, J. W. (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proc. Mach. Learn. Res., pp.\ 959--986. PMLR, 25--27 Apr 2023
work page 2023
-
[37]
Stein, M. L. Interpolation of Spatial Data: Some Theory for Kriging. Springer Ser. Stat. Springer New York, 1999
work page 1999
-
[38]
Surjanovic, S. and Bingham, D. Virtual library of simulation experiments: Test functions and datasets, 2013. URL https://www.sfu.ca/ ssurjano/. Accessed November 2025
work page 2013
-
[39]
Ranking over regression for B ayesian optimization and molecule selection
Tom, G., Lo, S., Corapi, S., Aspuru-Guzik, A., and Sanchez-Lengeling, B. Ranking over regression for B ayesian optimization and molecule selection. APL Machine Learning, 3 0 (3): 0 036113, 08 2025
work page 2025
-
[40]
Tuo, R. and Wang, W. Uncertainty quantification for B ayesian optimization. In Camps-Valls, G., Ruiz, F. J. R., and Valera, I. (eds.), Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 151 of Proc. Mach. Learn. Res., pp.\ 2862--2884. PMLR, 2022
work page 2022
-
[41]
Gpmp: the G aussian process micro package, 2026
Vazquez, E. Gpmp: the G aussian process micro package, 2026. URL https://github.com/gpmp-dev/gpmp
work page 2026
-
[42]
Vazquez, E. and Bect, J. Convergence properties of the expected improvement algorithm with fixed mean and covariance functions. J. Statist. Plann. Inference, 140: 0 3088--3095, 11 2010
work page 2010
-
[43]
An informational approach to the global optimization of expensive-to-evaluate functions
Villemonteix, J., Vazquez, E., and Walter, E. An informational approach to the global optimization of expensive-to-evaluate functions. J. Global Optim., 44 0 (4): 0 509--534, 2009
work page 2009
-
[44]
Algorithmic Learning in a Random World
Vovk, V., Gammerman, A., and Shafer, G. Algorithmic Learning in a Random World. Springer, 2005
work page 2005
-
[45]
Nonparametric predictive distributions based on conformal prediction
Vovk, V., Shen, J., Manokhin, V., and Xie, M. Nonparametric predictive distributions based on conformal prediction. Mach. Learn., 108 0 (3): 0 445--474, 2019
work page 2019
-
[46]
Zhang, Y. and Candès, E. J. Posterior conformal prediction, 2024. URL https://arxiv.org/abs/2409.19712
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.