Prediction Using a Bayesian Heteroscedastic Composite Gaussian Process

Casey B. Davis; Christopher M. Hans; Thomas J. Santner

arxiv: 1906.10737 · v1 · pith:FHKSBT62new · submitted 2019-06-25 · 📊 stat.ME

Prediction Using a Bayesian Heteroscedastic Composite Gaussian Process

Casey B. Davis , Christopher M. Hans , Thomas J. Santner This is my paper

Pith reviewed 2026-05-25 16:14 UTC · model grok-4.3

classification 📊 stat.ME

keywords Gaussian processcomposite modelheteroscedasticityBayesian predictionnon-stationaryMCMCprediction intervalsvariance modeling

0 comments

The pith

A Bayesian extension to the composite Gaussian process adds an input-dependent variance term to predict both stationary and non-stationary responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a Bayesian model that replaces the usual regression term in a Gaussian process setup with a global GP for large-scale trends. An independent local GP captures finer deviations, while a separate process lets the variance of the response change with the input values. A prior is chosen so the global trend stays smoother than the local part, and the covariance is modified to let the two components receive different weights. MCMC sampling produces posterior estimates of all parameters and of the variance at both training and new points, which are then used to form predictions and intervals. The approach is shown to apply whether the underlying response is stationary or not.

Core claim

The model Y(x) extends the composite Gaussian process by including a heteroscedastic component whose variance depends on the inputs. Large-scale trends are estimated by one Gaussian process and local trends by an independent second process. A prior is introduced that keeps the fitted global mean smoother than the local deviations, and the covariance structure is extended so the global and local components can be weighted differently. Markov chain Monte Carlo sampling yields the full posterior, from which predictions and prediction intervals are obtained for both stationary and non-stationary responses.

What carries the argument

The Bayesian heteroscedastic composite Gaussian process, which combines a global trend GP, an independent local deviation GP, and an input-dependent variance process under a smoothness-enforcing prior and differentially weighted covariance.

If this is right

The model produces predictions and uncertainty intervals for both stationary and non-stationary responses.
Posterior samples give estimates of the heteroscedastic variance at every training and test location.
Differential weighting of the global and local components is available through the extended covariance.
Markov chain Monte Carlo supplies the full posterior over all model parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structure could be tested on spatial data sets where measurement error visibly changes across the domain.
Allowing the local process to carry its own length-scale parameters might further increase flexibility without changing the overall three-process layout.
Direct comparison of out-of-sample interval coverage on simulated data with known input-dependent variance would quantify the benefit of the third process.

Load-bearing premise

The prior that forces the global mean to be smoother than the local deviations is appropriate for the data at hand.

What would settle it

On data generated from a process whose global trend is rougher than its local deviations, the model would either fail to separate the components or produce worse predictions than a standard stationary Gaussian process.

Figures

Figures reproduced from arXiv: 1906.10737 by Casey B. Davis, Christopher M. Hans, Thomas J. Santner.

**Figure 1.** Figure 1: Kriging predictors (red lines) for the BJX function (black lines) given in equation (2) based on the training data shown as black points together with 95% prediction intervals. Left Panel: constant mean; Right Panel: cubic mean. BJX function as having three behavior paradigms. For small x, y(x) can be described as having a relatively flat global trend with rapidly-changing local adjustments. For intermedia… view at source ↗

**Figure 2.** Figure 2: Predictions (in red) of the BJX test function y(x) in (2) and associated 95% uncertainty intervals (as a gray shadow) based on the CGP model. The dashed blue line is the estimate of the global component YG(x) under the CGP model. level in a hierarchical model and an additional step in a Markov chain Monte Carlo algorithm. We believe this direct approach to modeling will result in more accurate representati… view at source ↗

**Figure 3.** Figure 3: Prediction and 95% uncertainty bounds for the BJX function, y(x), in Example 4.1 (solid black): the BCGP predictor of y(x) (solid blue); 95% UQ limits of y(x) (dashed blue); estimated posterior mean of the YG(x) process (solid green); estimated posterior mean of the of the YL(x) process (solid magenta). relatively large variations in y(x) for x < 0.5. In contrast, the 95% bands produced by kriging predicto… view at source ↗

**Figure 4.** Figure 4: Marginal plots of state rate of heat transfer versus x1, x2, x3, x4 for Example 4.2. most active because they have the smallest median draws, and the smaller ρG,4 values show that x4 appears more active than x2. This is consistent with exploratory plots of the data in [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Boxplots of the posterior draws of all BCGP model parameters for Example 4.2 24 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Predicted versus simulated values for the 24 steady state heat exchange inputs of Example 4.2 The posterior predictive mean of the Y (x) process was estimated at the 24 test data locations. A plot of the simulated versus predicted values is shown in [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Predicted wing weight versus calculated wing weight for 150 test inputs based on 50 training inputs from a maximin LHD [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Boxplots of the predicted global trend function for the wing weight function, ybG(x), based on grouped x8 and x4 values for 150 test inputs. One opportunity that CGP and BCGP provide is the opportunity to examine the global trend curve, ybG(x). Here we consider the activity of inputs on ybG(x). Recall that x8, x3, x9, were considered active for wing weight y(x) while x4 was considered in-/low-activity. It … view at source ↗

read the original abstract

This research proposes a flexible Bayesian extension of the composite Gaussian process (CGP) model of Ba and Joseph (2012) for predicting (stationary or) non-stationary $y(\mathbf{x})$. The CGP generalizes the regression plus stationary Gaussian process (GP) model by replacing the regression term with a GP. The new model, $Y(\mathbf{x})$, can accommodate large-scale trends estimated by a global GP, local trends estimated by an independent local GP, and a third process to describe heteroscedastic data in which $Var(Y(\mathbf{x}))$ can depend on the inputs. This paper proposes a prior which ensures that the fitted global mean is smoother than the local deviations, and extends the covariance structure of the CGP to allow for differentially-weighted global and local components. A Markov chain Monte Carlo algorithm is proposed to provide posterior estimates of the parameters, including the values of the heteroscedastic variance at the training and test data locations. The posterior distribution is used to make predictions and to quantify the uncertainty of the predictions using prediction intervals. The method is illustrated using both stationary and non-stationary $y(\mathbf{x})$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bayesian extension of the 2012 composite GP that adds a third heteroscedastic process, a smoothness prior, and weighted covariances, with MCMC inference.

read the letter

This paper takes the composite Gaussian process from Ba and Joseph 2012 and makes it Bayesian while adding a third process to capture input-dependent variance. It also introduces a prior that keeps the global GP smoother than the local deviations and lets the covariance weights differ between the global and local parts. MCMC then samples the posterior for parameters and produces prediction intervals at new points. The construction is laid out clearly enough in the abstract to see how the pieces separate large-scale trends, local wiggles, and varying noise without forcing stationarity everywhere. That separation is the main practical addition. The model handles both stationary and non-stationary examples in principle, and the inference route is standard but directly applicable. The prior and weighting choices look like reasonable engineering fixes for the original CGP limitations. The main limitation is that the abstract gives no simulation results, coverage checks, or comparisons against simpler heteroscedastic GPs or the non-Bayesian baseline, so it is hard to judge whether the extra components improve predictions enough to justify the added parameters. Identifiability of the three processes under the new prior is also not discussed. This is for readers already working with composite or multi-scale GPs in spatial statistics or computer experiments who need a Bayesian route to heteroscedasticity. It is a concrete incremental step rather than a broad advance, but the model is specified well enough that a referee could check the derivations and any supplied code or examples. Send it to peer review.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a Bayesian extension of the composite Gaussian process (CGP) model from Ba and Joseph (2012) for predicting stationary or non-stationary responses y(x). The model Y(x) combines a global GP for large-scale trends, an independent local GP for local deviations, and a third process for input-dependent heteroscedastic variance. It introduces a prior ensuring the fitted global mean is smoother than local deviations, extends the covariance structure to allow differentially-weighted global and local components, and uses an MCMC algorithm to obtain posterior estimates of parameters (including heteroscedastic variance at training and test points) for prediction and interval construction. The approach is illustrated on both stationary and non-stationary functions.

Significance. If the prior construction and MCMC procedure achieve the claimed separation of scales without identifiability problems, the model offers a practical Bayesian framework for non-stationary heteroscedastic prediction with uncertainty quantification. The explicit prior for smoothness ordering and the covariance extension for differential weighting are strengths, as is the provision of an MCMC algorithm for full posterior inference rather than point estimates alone.

minor comments (3)

[Abstract] Abstract: the claim that the prior 'ensures' the global mean is smoother than local deviations should be cross-referenced to the specific prior definition (likely in the model section) so readers can verify the mechanism.
The manuscript should include a brief discussion of MCMC convergence diagnostics or mixing behavior for the heteroscedastic variance process parameters, as these are central to the prediction intervals.
Notation for the three processes and their covariance kernels should be introduced with a single consistent table or diagram early in the methods to aid readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their detailed summary of our work on the Bayesian heteroscedastic composite Gaussian process and for the positive assessment of its significance. The recommendation of minor revision is appreciated. However, the report lists no specific major comments under the MAJOR COMMENTS section.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper extends the external CGP model of Ba and Joseph (2012) by proposing a new prior for smoothness ordering between global and local GPs, extending the covariance to allow differential weighting, and adding a third heteroscedastic process. Posterior inference uses a standard MCMC algorithm whose outputs (parameter estimates and prediction intervals) are generated from the joint posterior rather than being algebraically identical to any fitted input. No derivation step reduces a claimed prediction to a fitted quantity by construction, no uniqueness theorem is imported from self-citation, and the model construction is presented as an independent modeling choice whose validity can be assessed against external data. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The model rests on a custom prior for smoothness ordering and an extended covariance kernel; both are introduced without external benchmarks or machine-checked proofs. MCMC fitting implies multiple hyperparameters whose values are not fixed a priori.

free parameters (2)

global and local GP length-scale and variance hyperparameters
Standard GP parameters estimated via MCMC; their specific values are not stated in the abstract.
heteroscedastic variance process parameters
Input-dependent variance parameters introduced as part of the third process and sampled by MCMC.

axioms (1)

domain assumption A prior exists that enforces the global mean to be smoother than local deviations
Explicitly proposed in the abstract as a modeling choice.

pith-pipeline@v0.9.0 · 5735 in / 1164 out tokens · 33790 ms · 2026-05-25T16:14:16.769774+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

and Joseph, V

Ba, S. and Joseph, V. R. (2012). Composite G aussian process models for emulating expensive functions. Annals of Applied Statistics\/ , 6 (4), 1838--1860

work page 2012
[2]

and Joseph, V

Ba, S. and Joseph, V. R. (2018). CGP : Composite G aussian Process Models\/ . R package version 2.1-1

work page 2018
[3]

P., and Gelfand, A

Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data\/ . Chapman and Hall, New York

work page 2004
[4]

H., Olshen, R

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees\/ . Chapman & Hall, New York

work page 1984
[5]

A., George, E

Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian cart model search. Journal of the American Statistical Association\/ , 93 (443), 935--960

work page 1998
[6]

Cressie, N. A. (1993). Statistics for Spatial Data\/ . J. Wiley, New York, F irst edition

work page 1993
[7]

Davis, C. B. (2015). A B ayesian approach to prediction and variable selection using nonstationary G aussian processes\/ . Ph.D. thesis, The Ohio State University

work page 2015
[8]

Forrester, A., Sobester, A., and Keane, A. (2008). Engineering design via surrogate modelling: A practical guide\/ . Wiley, Chicester, UK

work page 2008
[9]

Gattiker, J. R. (2008). Gaussian Process models for simulation analysis (GPM/SA) command, function, and data structure reference. Technical Report LA-UR-08-08057, Los Alamos National Laboratory

work page 2008
[10]

Gelman, A., Roberts, G., and Gilks, W. (1996). Efficient M etropolis jumping rules. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 5: Proceedings of the Fifth V alencia International Meeting\/ , pages 599--608. Oxford University Press, Oxford

work page 1996
[11]

Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed G aussian process models with an application to computer modeling. Journal of the American Statistical Association\/ , 103 (483), 1119--1130

work page 2008
[12]

Gramacy, R. B. and Lee, H. K. H. (2012). Cases for the nugget in modeling computer experiments. Statistics and Computing\/ , 22 (3), 713--722

work page 2012
[13]

Gu, M., Wang, X., and Berger, J. O. (2018). Robust gaussian stochastic process emulation. Annals of Statistics\/ , 46 , 3038--306

work page 2018
[14]

Higdon, D., Kennedy, M., Cavendish, J., Cafeo, J., and Ryne, R. (2004). Combining field data and computer simulations for calibration and prediction. SIAM Journal of Scientific Computing\/ , 26 , 448--466

work page 2004
[15]

Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high dimensional output. Journal of the American Statistical Association\/ , 103 , 570--583

work page 2008
[16]

and O'Hagan, A

Kennedy, M. and O'Hagan, A. (2001). Bayesian calibration of computer models (with discussion). Journal of the Royal Statistical Society Series B\/ , 63 , 425--464

work page 2001
[17]

E., Bankes, S., and Andronova, N

Lempert, R., Schlensinger, M. E., Bankes, S., and Andronova, N. (2000). The impacts of climate variability on near-term policy choices and the value of information. Climate Change\/ , 45 , 129--161

work page 2000
[18]

Neal, R. (1998). Regression and classification using G aussian process priors (with discussion). In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6: Proceedings of the Sixth V alencia International Meeting\/ , pages 475--501. Oxford University Press, Oxford

work page 1998
[19]

Oakley, J. (2002). Eliciting G aussian process priors for complex computer codes. Journal of the Royal Statistical Society, Series D\/ , 51 (1), 81--97

work page 2002
[20]

and O'Hagan, A

Oakley, J. and O'Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A B ayesian approach. Journal of the Royal Statistical Society Series B\/ , 66 , 751--769

work page 2004
[21]

O'Hagan, A. (1978). Curve fitting and optimal design for prediction (with discussion). Journal of the Royal Statistical Society B\/ , 40 , 1--42

work page 1978
[22]

Ong, K., Santner, T., and Bartel, D. (2008). Robust design for acetabular cup stability accounting for patient and surgical variability. Journal of Biomechanical Engineering\/ , 130 , 1--11

work page 2008
[23]

Z., Seepersad, C

Qian, P. Z., Seepersad, C. C., Joseph, V. R., Allen, J. K., and Wu, C. F. J. (2006). Building surrogate models with details and approximate simulations. ASME Journal of Mechanical Design\/ , 128 , 668--677

work page 2006
[24]

Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various M etropolis-- H astings algorithms. Statistical Science\/ , 16 (4), 351--367

work page 2001
[25]

O., Gelman, A., and Gilks, W

Roberts, G. O., Gelman, A., and Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk M etropolis algorithms. The Annals of Applied Probability\/ , 7 (1), 110--120

work page 1997
[26]

Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989). Design and analysis of computer experiments. Statistical Science\/ , 4 (4), 409--435

work page 1989
[27]

J., Williams, B

Santner, T. J., Williams, B. J., and Notz, W. I. (2018). The Design and Analysis of Computer Experiments, Second Edition\/ . Springer Verlag, New York

work page 2018
[28]

G., Chen, P.-H., Mulyana, R., Santner, T

Villarreal-Marroqu \' n, M. G., Chen, P.-H., Mulyana, R., Santner, T. J., Dean, A. M., and Castro, J. M. (2017). Multiobjective optimization of injection molding using a calibrated predictor based on physical and simulated data. Polymer Engineering & Science\/ , 57 (3), 248--257

work page 2017
[29]

W., and Ding, X

Xiong, Y., Chen, W., Apley, D. W., and Ding, X. (2007). A non-stationary covariance-based kriging method for metamodelling in engineering design. International Journal for Numerical Methods in Engineering\/ , 71 , 733--756

work page 2007

[1] [1]

and Joseph, V

Ba, S. and Joseph, V. R. (2012). Composite G aussian process models for emulating expensive functions. Annals of Applied Statistics\/ , 6 (4), 1838--1860

work page 2012

[2] [2]

and Joseph, V

Ba, S. and Joseph, V. R. (2018). CGP : Composite G aussian Process Models\/ . R package version 2.1-1

work page 2018

[3] [3]

P., and Gelfand, A

Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for Spatial Data\/ . Chapman and Hall, New York

work page 2004

[4] [4]

H., Olshen, R

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees\/ . Chapman & Hall, New York

work page 1984

[5] [5]

A., George, E

Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian cart model search. Journal of the American Statistical Association\/ , 93 (443), 935--960

work page 1998

[6] [6]

Cressie, N. A. (1993). Statistics for Spatial Data\/ . J. Wiley, New York, F irst edition

work page 1993

[7] [7]

Davis, C. B. (2015). A B ayesian approach to prediction and variable selection using nonstationary G aussian processes\/ . Ph.D. thesis, The Ohio State University

work page 2015

[8] [8]

Forrester, A., Sobester, A., and Keane, A. (2008). Engineering design via surrogate modelling: A practical guide\/ . Wiley, Chicester, UK

work page 2008

[9] [9]

Gattiker, J. R. (2008). Gaussian Process models for simulation analysis (GPM/SA) command, function, and data structure reference. Technical Report LA-UR-08-08057, Los Alamos National Laboratory

work page 2008

[10] [10]

Gelman, A., Roberts, G., and Gilks, W. (1996). Efficient M etropolis jumping rules. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 5: Proceedings of the Fifth V alencia International Meeting\/ , pages 599--608. Oxford University Press, Oxford

work page 1996

[11] [11]

Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed G aussian process models with an application to computer modeling. Journal of the American Statistical Association\/ , 103 (483), 1119--1130

work page 2008

[12] [12]

Gramacy, R. B. and Lee, H. K. H. (2012). Cases for the nugget in modeling computer experiments. Statistics and Computing\/ , 22 (3), 713--722

work page 2012

[13] [13]

Gu, M., Wang, X., and Berger, J. O. (2018). Robust gaussian stochastic process emulation. Annals of Statistics\/ , 46 , 3038--306

work page 2018

[14] [14]

Higdon, D., Kennedy, M., Cavendish, J., Cafeo, J., and Ryne, R. (2004). Combining field data and computer simulations for calibration and prediction. SIAM Journal of Scientific Computing\/ , 26 , 448--466

work page 2004

[15] [15]

Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2008). Computer model calibration using high dimensional output. Journal of the American Statistical Association\/ , 103 , 570--583

work page 2008

[16] [16]

and O'Hagan, A

Kennedy, M. and O'Hagan, A. (2001). Bayesian calibration of computer models (with discussion). Journal of the Royal Statistical Society Series B\/ , 63 , 425--464

work page 2001

[17] [17]

E., Bankes, S., and Andronova, N

Lempert, R., Schlensinger, M. E., Bankes, S., and Andronova, N. (2000). The impacts of climate variability on near-term policy choices and the value of information. Climate Change\/ , 45 , 129--161

work page 2000

[18] [18]

Neal, R. (1998). Regression and classification using G aussian process priors (with discussion). In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6: Proceedings of the Sixth V alencia International Meeting\/ , pages 475--501. Oxford University Press, Oxford

work page 1998

[19] [19]

Oakley, J. (2002). Eliciting G aussian process priors for complex computer codes. Journal of the Royal Statistical Society, Series D\/ , 51 (1), 81--97

work page 2002

[20] [20]

and O'Hagan, A

Oakley, J. and O'Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A B ayesian approach. Journal of the Royal Statistical Society Series B\/ , 66 , 751--769

work page 2004

[21] [21]

O'Hagan, A. (1978). Curve fitting and optimal design for prediction (with discussion). Journal of the Royal Statistical Society B\/ , 40 , 1--42

work page 1978

[22] [22]

Ong, K., Santner, T., and Bartel, D. (2008). Robust design for acetabular cup stability accounting for patient and surgical variability. Journal of Biomechanical Engineering\/ , 130 , 1--11

work page 2008

[23] [23]

Z., Seepersad, C

Qian, P. Z., Seepersad, C. C., Joseph, V. R., Allen, J. K., and Wu, C. F. J. (2006). Building surrogate models with details and approximate simulations. ASME Journal of Mechanical Design\/ , 128 , 668--677

work page 2006

[24] [24]

Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various M etropolis-- H astings algorithms. Statistical Science\/ , 16 (4), 351--367

work page 2001

[25] [25]

O., Gelman, A., and Gilks, W

Roberts, G. O., Gelman, A., and Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk M etropolis algorithms. The Annals of Applied Probability\/ , 7 (1), 110--120

work page 1997

[26] [26]

Sacks, J., Welch, W., Mitchell, T., and Wynn, H. (1989). Design and analysis of computer experiments. Statistical Science\/ , 4 (4), 409--435

work page 1989

[27] [27]

J., Williams, B

Santner, T. J., Williams, B. J., and Notz, W. I. (2018). The Design and Analysis of Computer Experiments, Second Edition\/ . Springer Verlag, New York

work page 2018

[28] [28]

G., Chen, P.-H., Mulyana, R., Santner, T

Villarreal-Marroqu \' n, M. G., Chen, P.-H., Mulyana, R., Santner, T. J., Dean, A. M., and Castro, J. M. (2017). Multiobjective optimization of injection molding using a calibrated predictor based on physical and simulated data. Polymer Engineering & Science\/ , 57 (3), 248--257

work page 2017

[29] [29]

W., and Ding, X

Xiong, Y., Chen, W., Apley, D. W., and Ding, X. (2007). A non-stationary covariance-based kriging method for metamodelling in engineering design. International Journal for Numerical Methods in Engineering\/ , 71 , 733--756

work page 2007