An ensemble-based approach for multi-fidelity emulation and adaptive sampling
Pith reviewed 2026-05-10 04:17 UTC · model grok-4.3
The pith
Ensemble learning of hierarchical kriging models produces a multi-fidelity emulator whose disagreement measure drives effective adaptive sampling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that aggregating hierarchical kriging emulators by Bayesian model averaging yields a multi-fidelity emulator whose between-model variance supplies an informative acquisition function for adaptive design, resulting in more accurate and robust approximations of high-fidelity simulators than single-model alternatives under fixed computational budgets.
What carries the argument
Bayesian model averaging of hierarchical kriging base learners, with the between-model variance component serving as the acquisition criterion for adaptive sampling.
If this is right
- The multi-fidelity emulator achieves lower prediction error and greater robustness than any single hierarchical kriging model on the tested benchmarks.
- The adaptive design strategy based on between-model variance selects training points that measurably improve emulator accuracy when the high-fidelity budget is constrained.
- Uncertainty quantification arises directly from the ensemble and requires no separate variance model.
- The same framework applies across a collection of established test problems without problem-specific tuning of the fidelity hierarchy.
Where Pith is reading between the lines
- The approach could be combined with other base learners besides kriging to handle simulators whose response surfaces are poorly captured by Gaussian processes.
- Because the acquisition function is derived from model disagreement rather than from a single posterior, it may remain effective even when the fidelity hierarchy is misspecified.
- The method offers a data-driven route to deciding how many low-fidelity runs to perform before adding each new high-fidelity point.
Load-bearing premise
The between-model variance obtained from Bayesian model averaging reliably identifies regions where additional high-fidelity evaluations will most improve the emulator without missing important areas or introducing systematic bias.
What would settle it
A benchmark problem in which the adaptive samples chosen by between-model variance produce no reduction in test error relative to random selection or to a competing acquisition function such as expected improvement.
Figures
read the original abstract
High-resolution simulation models are essential for representing complex physical systems, yet their substantial computational cost severely limits the number of feasible high-fidelity (HF) evaluations. This problem is often addressed through multi-fidelity frameworks, which employ hierarchies of simulators with varying levels of fidelity and evaluation cost. A key difficulty in this setting is integrating information from such heterogeneous sources to accurately approximate HF simulators. This paper proposes a novel multi-fidelity emulation methodology based on ensemble learning. The base learners of the ensemble are hierarchical kriging emulators that systematically incorporate information from lower-fidelity models into HF predictions. Aggregation of these base learners via Bayesian model averaging yields the multi-fidelity emulator with principled uncertainty quantification. The between-model variance component of this uncertainty is then employed as the acquisition criterion in an adaptive design strategy to enrich the training set with informative samples. The predictive performance of the approach is assessed on a collection of well-established benchmark problems. Results show that our multi-fidelity emulator outperforms single-model alternatives in terms of accuracy and robustness. Furthermore, the adaptive design strategy effectively identifies informative samples and improves emulator performance under limited computational budgets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an ensemble multi-fidelity emulator that uses hierarchical kriging models as base learners, aggregates them via Bayesian model averaging (BMA) to obtain predictions and uncertainty, and employs the between-model variance component of the BMA uncertainty as the acquisition function for adaptive sampling of additional high-fidelity points. It evaluates the method on standard benchmark problems and claims that the resulting emulator outperforms single-model alternatives in accuracy and robustness while the adaptive strategy improves performance under limited high-fidelity evaluation budgets.
Significance. If the central claims hold, the work offers a practical way to combine hierarchical kriging with ensemble uncertainty quantification for multi-fidelity surrogate modeling in expensive simulation settings. The explicit use of between-model variance for adaptive design is a clear integration of existing ideas, and the benchmark comparisons provide a starting point for assessing gains over single hierarchical kriging or other multi-fidelity baselines. The approach could be useful in engineering and scientific computing where HF evaluations are scarce, provided the acquisition function is shown to be robust to correlated model errors.
major comments (1)
- [Section 3.3 (Adaptive Design Strategy)] The adaptive sampling strategy (described after the BMA aggregation step) defines the acquisition function exclusively as the between-model variance of the hierarchical kriging ensemble. This quantity measures disagreement among base learners but excludes the individual predictive variances of each hierarchical kriging model and any systematic bias shared across all members (for example, identical low-fidelity correction terms or common kernel assumptions). When base learners exhibit positively correlated approximation errors, the acquisition function can assign low priority to regions where the multi-fidelity emulator remains inaccurate, directly threatening the claimed gains in accuracy and robustness under limited HF budgets.
minor comments (2)
- [Abstract and Section 4] The abstract and results section refer to 'well-established benchmark problems' and 'outperformance in terms of accuracy and robustness' without specifying the exact error metrics, number of replications, error bars, data exclusion rules, or precise hyperparameter settings for the hierarchical kriging base learners. These details are needed for reproducibility.
- [Section 3.2] Notation for the BMA weights and the decomposition of total variance into within- and between-model components should be introduced with explicit equations rather than descriptive text to avoid ambiguity when readers compare to standard BMA formulations.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for highlighting an important consideration in our adaptive design strategy. We address the major comment point by point below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Section 3.3 (Adaptive Design Strategy)] The adaptive sampling strategy (described after the BMA aggregation step) defines the acquisition function exclusively as the between-model variance of the hierarchical kriging ensemble. This quantity measures disagreement among base learners but excludes the individual predictive variances of each hierarchical kriging model and any systematic bias shared across all members (for example, identical low-fidelity correction terms or common kernel assumptions). When base learners exhibit positively correlated approximation errors, the acquisition function can assign low priority to regions where the multi-fidelity emulator remains inaccurate, directly threatening the claimed gains in accuracy and robustness under limited HF budgets.
Authors: We appreciate the referee's careful analysis of the acquisition function. In our ensemble, the hierarchical kriging base learners are constructed with deliberately varied hyperparameters, kernel functions, and low-fidelity correction structures to promote predictive diversity; this reduces (though does not eliminate) the risk of perfectly correlated errors. The between-model variance is used because, within the BMA framework, it isolates the component of uncertainty attributable to model choice, which is the quantity most directly reduced by acquiring new high-fidelity observations. The total BMA predictive variance already incorporates within-model variances, yet we deliberately select the between-model term to focus sampling on regions of model disagreement. Empirical results on the benchmark suite show consistent gains over single hierarchical kriging models, suggesting that the chosen acquisition function is effective in practice. We acknowledge that shared systematic bias remains a theoretical limitation of any ensemble method that does not explicitly model common error structure. Accordingly, we will add a paragraph in Section 3.3 and the discussion section explicitly noting this caveat, referencing the possibility of future hybrid acquisition functions that combine between-model variance with average within-model variance. No modification to the core algorithm or reported experiments is required. revision: partial
Circularity Check
No significant circularity detected in the derivation chain
full rationale
The paper constructs its multi-fidelity emulator from hierarchical kriging base learners aggregated by standard Bayesian model averaging, then uses the between-model variance component as an acquisition function for adaptive sampling. These steps apply established ensemble and uncertainty-quantification techniques to the multi-fidelity setting without any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation. Performance claims rest on empirical evaluation against benchmarks rather than being forced by the method's own equations. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Hierarchical kriging models can systematically incorporate information from lower-fidelity simulators into high-fidelity predictions
- domain assumption Between-model variance from Bayesian model averaging serves as an effective acquisition criterion for identifying informative samples
Reference graph
Works this paper leans on
-
[1]
E. Acar and M. Rais-Rohani. Ensemble of metamodels with optimized weight factors. Structural and Multidisciplinary Optimization, 37(3):279–294, 2009
work page 2009
-
[2]
Lo¨ ıc Brevault, Mathieu Balesdent, and Ali Hebbal. Overview of Gaussian process based multi-fidelity techniques with variable relationship between fidelities, application to aerospace systems.Aerospace Science and Technology, 107:106339, 2020
work page 2020
-
[3]
Robert Burbidge, Jem J. Rowland, and Ross D. King. Active learning for regression based on query by committee. pages 209–218. Springer Berlin Heidelberg, 2007
work page 2007
-
[4]
Deconditional downscaling with Gaussian processes
Siu Lun Chau, Shahine Bouabid, and Dino Sejdinovic. Deconditional downscaling with Gaussian processes. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 17813–17825. Curran Associates, Inc., 2021
work page 2021
-
[5]
Wiley Series in Probability and Statistics
Noel Cressie.Statistics for spatial data. Wiley Series in Probability and Statistics. John Wiley & Sons, 2015. 20
work page 2015
-
[6]
Kurt Cutajar, Mark Pullin, Andreas C. Damianou, Neil D. Lawrence, and Javier I. Gonz´ alez. Deep Gaussian processes for multi-fidelity modeling.ArXiv, 2019
work page 2019
- [7]
-
[8]
Ajami, Xiaogang Gao, and Soroosh Sorooshian
Qingyun Duan, Newsha K. Ajami, Xiaogang Gao, and Soroosh Sorooshian. Multi- model ensemble hydrologic prediction using Bayesian model averaging
-
[9]
Delphine Dupuy, C´ eline Helbert, and Jessica Franco. Dicedesign and diceeval: Two R packages for design and analysis of computer experiments.Journal of Statistical Software, 65(11):1–38, 2015
work page 2015
-
[10]
Fuhg, Am´ elie Fau, and Udo Nackenhorst
Jan N. Fuhg, Am´ elie Fau, and Udo Nackenhorst. State-of-the-art and comparative review of adaptive sampling methods for kriging.Archives of Computational Methods in Engineering, 28(4):2689–2747, 2021
work page 2021
-
[11]
Financial applications of Gaussian processes and Bayesian optimization.SSRN Electronic Journal, 2019
Joan Gonzalvez, Edmond Lezmi, Thierry Roncalli, and Jiali Xu. Financial applications of Gaussian processes and Bayesian optimization.SSRN Electronic Journal, 2019
work page 2019
-
[12]
Gramacy.Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences
Robert B. Gramacy.Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Chapman Hall/CRC, 2020
work page 2020
-
[13]
Loic Le Gratiet and Josselin Garnier. Recursive co-kriging model for design of com- puter experiments with multiple levels of fidelity.International Journal for Uncer- tainty Quantification, 4(5), 2014
work page 2014
-
[14]
Zhong-Hua Han and Stefan G¨ ortz. Hierarchical kriging model for variable-fidelity surrogate modeling.AIAA Journal, 50(9):1885 –1896, 2012
work page 2012
-
[15]
Junoh Heo and Chih-Li Sung. Active learning for a recursive non-additive emulator for multi-fidelity computer experiments.Technometrics, 67(1):58–72, 2025
work page 2025
-
[16]
Multi-fidelity Bayesian optimisation with continuous approximations
Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnab´ as P´ oczos. Multi-fidelity Bayesian optimisation with continuous approximations. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1799–1808. PMLR, 2017
work page 2017
-
[17]
M. C. Kennedy and A. O’Hagan. Predicting the output from a complex computer code when fast approximations are available.Biometrika, 87(1):1–13, 2000
work page 2000
-
[18]
Andr´ e I. Khuri and Siuli Mukhopadhyay. Response surface methodology.WIREs Computational Statistics, 2(2):128–149, 2010. 21
work page 2010
-
[19]
Jongwoo Ko and Heeyoung Kim. Deep Gaussian process models for integrating mul- tifidelity experiments with nonstationary relationships.IISE Transactions, 54(7):686– 698, 2022
work page 2022
-
[20]
Neural network ensembles, cross validation, and active learning
Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross validation, and active learning. InAdvances in Neural Information Processing Systems, volume 7, page 231–238. MIT Press, 1995
work page 1995
-
[21]
R´ emi Lam, Douglas L. Allaire, and Karen E. Willcox. Multifidelity optimization using statistical surrogate modeling for non-hierarchical information sources. In56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference. American Institute of Aeronautics and Astronautics, 2015
work page 2015
-
[22]
Loic Le Gratiet. Bayesian analysis of hierarchical multifidelity codes.SIAM/ASA Journal on Uncertainty Quantification, 1(1):244–269, 2013
work page 2013
-
[23]
Haitao Liu, Yew-Soon Ong, and Jianfei Cai. A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design.Structural and Multidisciplinary Optimization, 57(1):393–416, 2018
work page 2018
-
[24]
Ensemble approaches for regression: A survey.ACM Computing Surveys, 45(1):10:1– 10:40, 2012
Jo˜ ao Mendes-Moreira, Carlos Soares, Al´ ıpio M´ ario Jorge, and Jorge Freire de Sousa. Ensemble approaches for regression: A survey.ACM Computing Surveys, 45(1):10:1– 10:40, 2012
work page 2012
-
[25]
Ibomoiye Domor Mienye and Yanxia Sun. A survey of ensemble learning: Concepts, algorithms, applications, and prospects.IEEE Access, 10:99129–99149, 2022
work page 2022
-
[26]
Hossein Mohammadi and Peter Challenor. Sequential adaptive design for emulating costly computer codes.Journal of Statistical Computation and Simulation, 95(3):654– 675, 2025
work page 2025
-
[27]
Em- ulating computer models with step-discontinuous outputs using Gaussian processes
Hossein Mohammadi, Peter Challenor, Marc Goodfellow, and Daniel Williamson. Em- ulating computer models with step-discontinuous outputs using Gaussian processes. Technical report, ArXiv, 2020
work page 2020
-
[28]
Hossein Mohammadi, Peter Challenor, Daniel Williamson, and Marc Goodfellow. Cross-validation–based adaptive sampling for Gaussian process models.SIAM/ASA Journal on Uncertainty Quantification, 10(1):294–316, 2022
work page 2022
-
[29]
Benjamin Peherstorfer, Karen Willcox, and Max Gunzburger. Survey of multifidelity methods in uncertainty propagation, inference, and optimization.SIAM Review, 60(3):550–591, 2018. 22
work page 2018
-
[30]
P. Perdikaris, M. Raissi, A. Damianou, N. D. Lawrence, and G. E. Karniadakis. Non- linear information fusion algorithms for data-efficient multi-fidelity modelling.Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2198):20160751, 2017
work page 2017
-
[31]
Robi Polikar.Ensemble Learning, pages 1–34. Springer, New York, NY, 2012
work page 2012
- [32]
-
[33]
Raftery, Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski
Adrian E. Raftery, Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski. Using bayesian model averaging to calibrate forecast ensembles.Monthly Weather Review, 133(5):1155 – 1174, 2005
work page 2005
-
[34]
Raftery, David Madigan, and Jennifer A
Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting. Bayesian model aver- aging for linear regression models.Journal of the American Statistical Association, 92(437):179–191, 1997
work page 1997
-
[35]
Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian processes for machine learning (adaptive computation and machine learning). The MIT Press, 2005
work page 2005
-
[36]
Olivier Roustant, David Ginsbourger, and Yves Deville. DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization.Journal of Statistical Software, 51(1):1–55, 2012
work page 2012
-
[37]
Sonja Surjanovic and Derek Bingham. Virtual library of simulation experiments: Test functions and datasets.http://www.sfu.ca/ ~ssurjano, 2024
work page 2024
-
[38]
Yu Zhang, Zhong-Hua Han, and ke-shi Zhang. Variable-fidelity expected improve- ment method for efficient global optimization of expensive functions.Structural and Multidisciplinary Optimization, 58(4):1431–1451, 2018
work page 2018
-
[39]
Yu Zhang, Zhong hua Han, and Wen ping Song. An efficient robust aerodynamic design optimization method based on a multi-level hierarchical kriging model and multi-fidelity expected improvement.Aerospace Science and Technology, 152:109401, 2024. Appendix A Analytical test functions The analytical expressions for the test functions, defined on the unit hype...
work page 2024
-
[40]
A.3 Park 2 function (4D) fHF(x) = 2 3 exp(x1 +x 2)−x 4 sin(x3) +x 3, fLF(x) = 1.2fHF(x)−1
x4 x2 1 −1 + (x1 + 3x4) exp(1 + sin(x3)), fLF(x) = 1 + sin(x1) 10 fHF(x)−2x 1 +x 2 2 +x 2 3 + 0.5. A.3 Park 2 function (4D) fHF(x) = 2 3 exp(x1 +x 2)−x 4 sin(x3) +x 3, fLF(x) = 1.2fHF(x)−1. A.4 Hartmann function (6D) fHF(x) =− 1 1.94 2.58 + 4X i=1 αi exp − 6X j=1 Aij(xj −P ij)2 , fLF(x) =− 1 1.94 2.58 + 3X i=1 αi exp − 6X j=1 Aij(xj −P...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.