pith. sign in

arxiv: 2604.18045 · v1 · submitted 2026-04-20 · 📊 stat.ME

An ensemble-based approach for multi-fidelity emulation and adaptive sampling

Pith reviewed 2026-05-10 04:17 UTC · model grok-4.3

classification 📊 stat.ME
keywords multi-fidelity emulationensemble learninghierarchical krigingBayesian model averagingadaptive samplinguncertainty quantificationsurrogate modelingbenchmark problems
0
0 comments X

The pith

Ensemble learning of hierarchical kriging models produces a multi-fidelity emulator whose disagreement measure drives effective adaptive sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an ensemble approach to multi-fidelity emulation that combines several hierarchical kriging models through Bayesian model averaging. This produces a single predictor for the high-fidelity simulator together with an uncertainty measure whose between-model component is used to select new high-fidelity points. The method is evaluated on standard benchmark problems and shown to give lower error than any of the individual hierarchical kriging models while using the disagreement signal to improve performance when the number of expensive evaluations is limited.

Core claim

The central claim is that aggregating hierarchical kriging emulators by Bayesian model averaging yields a multi-fidelity emulator whose between-model variance supplies an informative acquisition function for adaptive design, resulting in more accurate and robust approximations of high-fidelity simulators than single-model alternatives under fixed computational budgets.

What carries the argument

Bayesian model averaging of hierarchical kriging base learners, with the between-model variance component serving as the acquisition criterion for adaptive sampling.

If this is right

  • The multi-fidelity emulator achieves lower prediction error and greater robustness than any single hierarchical kriging model on the tested benchmarks.
  • The adaptive design strategy based on between-model variance selects training points that measurably improve emulator accuracy when the high-fidelity budget is constrained.
  • Uncertainty quantification arises directly from the ensemble and requires no separate variance model.
  • The same framework applies across a collection of established test problems without problem-specific tuning of the fidelity hierarchy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be combined with other base learners besides kriging to handle simulators whose response surfaces are poorly captured by Gaussian processes.
  • Because the acquisition function is derived from model disagreement rather than from a single posterior, it may remain effective even when the fidelity hierarchy is misspecified.
  • The method offers a data-driven route to deciding how many low-fidelity runs to perform before adding each new high-fidelity point.

Load-bearing premise

The between-model variance obtained from Bayesian model averaging reliably identifies regions where additional high-fidelity evaluations will most improve the emulator without missing important areas or introducing systematic bias.

What would settle it

A benchmark problem in which the adaptive samples chosen by between-model variance produce no reduction in test error relative to random selection or to a competing acquisition function such as expected improvement.

Figures

Figures reproduced from arXiv: 2604.18045 by Hossein Mohammadi.

Figure 1
Figure 1. Figure 1: Hierarchy of computer codes with L levels of fidelity. The HF simulator, denoted by f (L) (x), yields the most accurate representation of the QoI at a high computational cost. The lower-fidelity models, f (1)(x), . . . , f(L−1)(x), provide cheaper but less accurate approximations of the same QoI. Real-time prediction of HF simulators is vital in applications such as climate change and nuclear safety, where… view at source ↗
Figure 2
Figure 2. Figure 2: Left: Multi-fidelity predictive mean ¯m(x) (solid blue) together with the associated uncertainty bounds ¯m(x) ± 2¯σ(x) (shaded area), shown relative to the LF (black dashed) and HF (red dashed) functions. For visual clarity, the LF points are omitted. Right: Multi￾fidelity predictive variance ¯σ 2 (x) (black) which is the sum of the within-model variance σ 2 wm(x) (blue) and the between-model variance σ 2 … view at source ↗
Figure 3
Figure 3. Figure 3: Adaptive sampling driven by the between-model variance [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average RMSE for the 2D Currin function evaluated over 30 different initial LHS [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Average RMSE for the 4D Park 1 function computed over 30 different initial [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average RMSE for the 4D Park 2 function computed over 30 different initial [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Average RMSE for the 6D Hartmann function computed over 30 different initial [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance of the adaptive sampling strategy using the between-model variance [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
read the original abstract

High-resolution simulation models are essential for representing complex physical systems, yet their substantial computational cost severely limits the number of feasible high-fidelity (HF) evaluations. This problem is often addressed through multi-fidelity frameworks, which employ hierarchies of simulators with varying levels of fidelity and evaluation cost. A key difficulty in this setting is integrating information from such heterogeneous sources to accurately approximate HF simulators. This paper proposes a novel multi-fidelity emulation methodology based on ensemble learning. The base learners of the ensemble are hierarchical kriging emulators that systematically incorporate information from lower-fidelity models into HF predictions. Aggregation of these base learners via Bayesian model averaging yields the multi-fidelity emulator with principled uncertainty quantification. The between-model variance component of this uncertainty is then employed as the acquisition criterion in an adaptive design strategy to enrich the training set with informative samples. The predictive performance of the approach is assessed on a collection of well-established benchmark problems. Results show that our multi-fidelity emulator outperforms single-model alternatives in terms of accuracy and robustness. Furthermore, the adaptive design strategy effectively identifies informative samples and improves emulator performance under limited computational budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes an ensemble multi-fidelity emulator that uses hierarchical kriging models as base learners, aggregates them via Bayesian model averaging (BMA) to obtain predictions and uncertainty, and employs the between-model variance component of the BMA uncertainty as the acquisition function for adaptive sampling of additional high-fidelity points. It evaluates the method on standard benchmark problems and claims that the resulting emulator outperforms single-model alternatives in accuracy and robustness while the adaptive strategy improves performance under limited high-fidelity evaluation budgets.

Significance. If the central claims hold, the work offers a practical way to combine hierarchical kriging with ensemble uncertainty quantification for multi-fidelity surrogate modeling in expensive simulation settings. The explicit use of between-model variance for adaptive design is a clear integration of existing ideas, and the benchmark comparisons provide a starting point for assessing gains over single hierarchical kriging or other multi-fidelity baselines. The approach could be useful in engineering and scientific computing where HF evaluations are scarce, provided the acquisition function is shown to be robust to correlated model errors.

major comments (1)
  1. [Section 3.3 (Adaptive Design Strategy)] The adaptive sampling strategy (described after the BMA aggregation step) defines the acquisition function exclusively as the between-model variance of the hierarchical kriging ensemble. This quantity measures disagreement among base learners but excludes the individual predictive variances of each hierarchical kriging model and any systematic bias shared across all members (for example, identical low-fidelity correction terms or common kernel assumptions). When base learners exhibit positively correlated approximation errors, the acquisition function can assign low priority to regions where the multi-fidelity emulator remains inaccurate, directly threatening the claimed gains in accuracy and robustness under limited HF budgets.
minor comments (2)
  1. [Abstract and Section 4] The abstract and results section refer to 'well-established benchmark problems' and 'outperformance in terms of accuracy and robustness' without specifying the exact error metrics, number of replications, error bars, data exclusion rules, or precise hyperparameter settings for the hierarchical kriging base learners. These details are needed for reproducibility.
  2. [Section 3.2] Notation for the BMA weights and the decomposition of total variance into within- and between-model components should be introduced with explicit equations rather than descriptive text to avoid ambiguity when readers compare to standard BMA formulations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting an important consideration in our adaptive design strategy. We address the major comment point by point below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Section 3.3 (Adaptive Design Strategy)] The adaptive sampling strategy (described after the BMA aggregation step) defines the acquisition function exclusively as the between-model variance of the hierarchical kriging ensemble. This quantity measures disagreement among base learners but excludes the individual predictive variances of each hierarchical kriging model and any systematic bias shared across all members (for example, identical low-fidelity correction terms or common kernel assumptions). When base learners exhibit positively correlated approximation errors, the acquisition function can assign low priority to regions where the multi-fidelity emulator remains inaccurate, directly threatening the claimed gains in accuracy and robustness under limited HF budgets.

    Authors: We appreciate the referee's careful analysis of the acquisition function. In our ensemble, the hierarchical kriging base learners are constructed with deliberately varied hyperparameters, kernel functions, and low-fidelity correction structures to promote predictive diversity; this reduces (though does not eliminate) the risk of perfectly correlated errors. The between-model variance is used because, within the BMA framework, it isolates the component of uncertainty attributable to model choice, which is the quantity most directly reduced by acquiring new high-fidelity observations. The total BMA predictive variance already incorporates within-model variances, yet we deliberately select the between-model term to focus sampling on regions of model disagreement. Empirical results on the benchmark suite show consistent gains over single hierarchical kriging models, suggesting that the chosen acquisition function is effective in practice. We acknowledge that shared systematic bias remains a theoretical limitation of any ensemble method that does not explicitly model common error structure. Accordingly, we will add a paragraph in Section 3.3 and the discussion section explicitly noting this caveat, referencing the possibility of future hybrid acquisition functions that combine between-model variance with average within-model variance. No modification to the core algorithm or reported experiments is required. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper constructs its multi-fidelity emulator from hierarchical kriging base learners aggregated by standard Bayesian model averaging, then uses the between-model variance component as an acquisition function for adaptive sampling. These steps apply established ensemble and uncertainty-quantification techniques to the multi-fidelity setting without any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation. Performance claims rest on empirical evaluation against benchmarks rather than being forced by the method's own equations. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions from Gaussian process regression and ensemble methods; no new entities are postulated and no free parameters are explicitly fitted beyond those inherent to kriging.

axioms (2)
  • domain assumption Hierarchical kriging models can systematically incorporate information from lower-fidelity simulators into high-fidelity predictions
    Invoked when defining the base learners of the ensemble.
  • domain assumption Between-model variance from Bayesian model averaging serves as an effective acquisition criterion for identifying informative samples
    Central to the adaptive design strategy.

pith-pipeline@v0.9.0 · 5485 in / 1307 out tokens · 36567 ms · 2026-05-10T04:17:05.978124+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Acar and M

    E. Acar and M. Rais-Rohani. Ensemble of metamodels with optimized weight factors. Structural and Multidisciplinary Optimization, 37(3):279–294, 2009

  2. [2]

    Lo¨ ıc Brevault, Mathieu Balesdent, and Ali Hebbal. Overview of Gaussian process based multi-fidelity techniques with variable relationship between fidelities, application to aerospace systems.Aerospace Science and Technology, 107:106339, 2020

  3. [3]

    Rowland, and Ross D

    Robert Burbidge, Jem J. Rowland, and Ross D. King. Active learning for regression based on query by committee. pages 209–218. Springer Berlin Heidelberg, 2007

  4. [4]

    Deconditional downscaling with Gaussian processes

    Siu Lun Chau, Shahine Bouabid, and Dino Sejdinovic. Deconditional downscaling with Gaussian processes. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 17813–17825. Curran Associates, Inc., 2021

  5. [5]

    Wiley Series in Probability and Statistics

    Noel Cressie.Statistics for spatial data. Wiley Series in Probability and Statistics. John Wiley & Sons, 2015. 20

  6. [6]

    Damianou, Neil D

    Kurt Cutajar, Mark Pullin, Andreas C. Damianou, Neil D. Lawrence, and Javier I. Gonz´ alez. Deep Gaussian processes for multi-fidelity modeling.ArXiv, 2019

  7. [7]

    Lawrence

    Andreas Damianou and Neil D. Lawrence. Deep Gaussian processes. InProceed- ings of the Sixteenth International Conference on Artificial Intelligence and Statistics, volume 31 ofProceedings of Machine Learning Research, pages 207–215. PMLR, 2013

  8. [8]

    Ajami, Xiaogang Gao, and Soroosh Sorooshian

    Qingyun Duan, Newsha K. Ajami, Xiaogang Gao, and Soroosh Sorooshian. Multi- model ensemble hydrologic prediction using Bayesian model averaging

  9. [9]

    Dicedesign and diceeval: Two R packages for design and analysis of computer experiments.Journal of Statistical Software, 65(11):1–38, 2015

    Delphine Dupuy, C´ eline Helbert, and Jessica Franco. Dicedesign and diceeval: Two R packages for design and analysis of computer experiments.Journal of Statistical Software, 65(11):1–38, 2015

  10. [10]

    Fuhg, Am´ elie Fau, and Udo Nackenhorst

    Jan N. Fuhg, Am´ elie Fau, and Udo Nackenhorst. State-of-the-art and comparative review of adaptive sampling methods for kriging.Archives of Computational Methods in Engineering, 28(4):2689–2747, 2021

  11. [11]

    Financial applications of Gaussian processes and Bayesian optimization.SSRN Electronic Journal, 2019

    Joan Gonzalvez, Edmond Lezmi, Thierry Roncalli, and Jiali Xu. Financial applications of Gaussian processes and Bayesian optimization.SSRN Electronic Journal, 2019

  12. [12]

    Gramacy.Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences

    Robert B. Gramacy.Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Chapman Hall/CRC, 2020

  13. [13]

    Recursive co-kriging model for design of com- puter experiments with multiple levels of fidelity.International Journal for Uncer- tainty Quantification, 4(5), 2014

    Loic Le Gratiet and Josselin Garnier. Recursive co-kriging model for design of com- puter experiments with multiple levels of fidelity.International Journal for Uncer- tainty Quantification, 4(5), 2014

  14. [14]

    Hierarchical kriging model for variable-fidelity surrogate modeling.AIAA Journal, 50(9):1885 –1896, 2012

    Zhong-Hua Han and Stefan G¨ ortz. Hierarchical kriging model for variable-fidelity surrogate modeling.AIAA Journal, 50(9):1885 –1896, 2012

  15. [15]

    Active learning for a recursive non-additive emulator for multi-fidelity computer experiments.Technometrics, 67(1):58–72, 2025

    Junoh Heo and Chih-Li Sung. Active learning for a recursive non-additive emulator for multi-fidelity computer experiments.Technometrics, 67(1):58–72, 2025

  16. [16]

    Multi-fidelity Bayesian optimisation with continuous approximations

    Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnab´ as P´ oczos. Multi-fidelity Bayesian optimisation with continuous approximations. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1799–1808. PMLR, 2017

  17. [17]

    M. C. Kennedy and A. O’Hagan. Predicting the output from a complex computer code when fast approximations are available.Biometrika, 87(1):1–13, 2000

  18. [18]

    Khuri and Siuli Mukhopadhyay

    Andr´ e I. Khuri and Siuli Mukhopadhyay. Response surface methodology.WIREs Computational Statistics, 2(2):128–149, 2010. 21

  19. [19]

    Deep Gaussian process models for integrating mul- tifidelity experiments with nonstationary relationships.IISE Transactions, 54(7):686– 698, 2022

    Jongwoo Ko and Heeyoung Kim. Deep Gaussian process models for integrating mul- tifidelity experiments with nonstationary relationships.IISE Transactions, 54(7):686– 698, 2022

  20. [20]

    Neural network ensembles, cross validation, and active learning

    Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation, and active learning. InAdvances in Neural Information Processing Systems, volume 7, page 231–238. MIT Press, 1995

  21. [21]

    Allaire, and Karen E

    R´ emi Lam, Douglas L. Allaire, and Karen E. Willcox. Multifidelity optimization using statistical surrogate modeling for non-hierarchical information sources. In56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. American Institute of Aeronautics and Astronautics, 2015

  22. [22]

    Bayesian analysis of hierarchical multifidelity codes.SIAM/ASA Journal on Uncertainty Quantification, 1(1):244–269, 2013

    Loic Le Gratiet. Bayesian analysis of hierarchical multifidelity codes.SIAM/ASA Journal on Uncertainty Quantification, 1(1):244–269, 2013

  23. [23]

    A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design.Structural and Multidisciplinary Optimization, 57(1):393–416, 2018

    Haitao Liu, Yew-Soon Ong, and Jianfei Cai. A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design.Structural and Multidisciplinary Optimization, 57(1):393–416, 2018

  24. [24]

    Ensemble approaches for regression: A survey.ACM Computing Surveys, 45(1):10:1– 10:40, 2012

    Jo˜ ao Mendes-Moreira, Carlos Soares, Al´ ıpio M´ ario Jorge, and Jorge Freire de Sousa. Ensemble approaches for regression: A survey.ACM Computing Surveys, 45(1):10:1– 10:40, 2012

  25. [25]

    A survey of ensemble learning: Concepts, algorithms, applications, and prospects.IEEE Access, 10:99129–99149, 2022

    Ibomoiye Domor Mienye and Yanxia Sun. A survey of ensemble learning: Concepts, algorithms, applications, and prospects.IEEE Access, 10:99129–99149, 2022

  26. [26]

    Sequential adaptive design for emulating costly computer codes.Journal of Statistical Computation and Simulation, 95(3):654– 675, 2025

    Hossein Mohammadi and Peter Challenor. Sequential adaptive design for emulating costly computer codes.Journal of Statistical Computation and Simulation, 95(3):654– 675, 2025

  27. [27]

    Em- ulating computer models with step-discontinuous outputs using Gaussian processes

    Hossein Mohammadi, Peter Challenor, Marc Goodfellow, and Daniel Williamson. Em- ulating computer models with step-discontinuous outputs using Gaussian processes. Technical report, ArXiv, 2020

  28. [28]

    Cross-validation–based adaptive sampling for Gaussian process models.SIAM/ASA Journal on Uncertainty Quantification, 10(1):294–316, 2022

    Hossein Mohammadi, Peter Challenor, Daniel Williamson, and Marc Goodfellow. Cross-validation–based adaptive sampling for Gaussian process models.SIAM/ASA Journal on Uncertainty Quantification, 10(1):294–316, 2022

  29. [29]

    Survey of multifidelity methods in uncertainty propagation, inference, and optimization.SIAM Review, 60(3):550–591, 2018

    Benjamin Peherstorfer, Karen Willcox, and Max Gunzburger. Survey of multifidelity methods in uncertainty propagation, inference, and optimization.SIAM Review, 60(3):550–591, 2018. 22

  30. [30]

    Perdikaris, M

    P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence, and G. E. Karniadakis. Non- linear information fusion algorithms for data-efficient multi-fidelity modelling.Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2198):20160751, 2017

  31. [31]

    Springer, New York, NY, 2012

    Robi Polikar.Ensemble Learning, pages 1–34. Springer, New York, NY, 2012

  32. [32]

    M¨ uller

    Luc Pronzato and Werner G. M¨ uller. Design of computer experiments: space filling and beyond.Statistics and Computing, 22(3):681–701, 2012

  33. [33]

    Raftery, Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski

    Adrian E. Raftery, Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski. Using bayesian model averaging to calibrate forecast ensembles.Monthly Weather Review, 133(5):1155 – 1174, 2005

  34. [34]

    Raftery, David Madigan, and Jennifer A

    Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting. Bayesian model aver- aging for linear regression models.Journal of the American Statistical Association, 92(437):179–191, 1997

  35. [35]

    Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian processes for machine learning (adaptive computation and machine learning). The MIT Press, 2005

  36. [36]

    DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization.Journal of Statistical Software, 51(1):1–55, 2012

    Olivier Roustant, David Ginsbourger, and Yves Deville. DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization.Journal of Statistical Software, 51(1):1–55, 2012

  37. [37]

    Virtual library of simulation experiments: Test functions and datasets.http://www.sfu.ca/ ~ssurjano, 2024

    Sonja Surjanovic and Derek Bingham. Virtual library of simulation experiments: Test functions and datasets.http://www.sfu.ca/ ~ssurjano, 2024

  38. [38]

    Variable-fidelity expected improve- ment method for efficient global optimization of expensive functions.Structural and Multidisciplinary Optimization, 58(4):1431–1451, 2018

    Yu Zhang, Zhong-Hua Han, and ke-shi Zhang. Variable-fidelity expected improve- ment method for efficient global optimization of expensive functions.Structural and Multidisciplinary Optimization, 58(4):1431–1451, 2018

  39. [39]

    Yu Zhang, Zhong hua Han, and Wen ping Song. An efficient robust aerodynamic design optimization method based on a multi-level hierarchical kriging model and multi-fidelity expected improvement.Aerospace Science and Technology, 152:109401, 2024. Appendix A Analytical test functions The analytical expressions for the test functions, defined on the unit hype...

  40. [40]

    A.3 Park 2 function (4D) fHF(x) = 2 3 exp(x1 +x 2)−x 4 sin(x3) +x 3, fLF(x) = 1.2fHF(x)−1

    x4 x2 1 −1 + (x1 + 3x4) exp(1 + sin(x3)), fLF(x) = 1 + sin(x1) 10 fHF(x)−2x 1 +x 2 2 +x 2 3 + 0.5. A.3 Park 2 function (4D) fHF(x) = 2 3 exp(x1 +x 2)−x 4 sin(x3) +x 3, fLF(x) = 1.2fHF(x)−1. A.4 Hartmann function (6D) fHF(x) =− 1 1.94  2.58 + 4X i=1 αi exp  − 6X j=1 Aij(xj −P ij)2     , fLF(x) =− 1 1.94  2.58 + 3X i=1 αi exp  − 6X j=1 Aij(xj −P...