Benchmarking Machine Learning Uncertainty Quantification Methodologies for Predicting Turbine Gas Temperature Degradation
Pith reviewed 2026-06-29 08:31 UTC · model grok-4.3
The pith
Experiments on turbine gas temperature data show five uncertainty methods exhibit distinct trade-offs in coverage, width, and stability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the five methods—Delta, Bayesian Monte Carlo Dropout, Bootstrap, Lower-Upper Bound Estimation, and Mean-Variance Estimation—are evaluated on the turbine gas temperature dataset using coverage probability, normalized mean prediction interval width, and coverage width-based criterion within a unified framework of cross-validation and repeated splits, each method displays a unique pattern of strengths and weaknesses across those measures.
What carries the argument
A unified experimental framework consisting of cross-validation for hyperparameter selection, repeated train-test splits for robustness, and three evaluation metrics applied to the five interval construction approaches.
If this is right
- The choice of method affects both the safety margin and the tightness of predictions in engine health management.
- Stability across data splits varies, influencing how dependable the uncertainty estimates are in repeated use.
- A practical selection guide emerges for balancing interpretability and precision in real-world prognostics applications.
- Different priorities, such as high coverage for critical safety versus narrow intervals for operational efficiency, lead to different preferred methods.
Where Pith is reading between the lines
- The observed trade-offs could be tested on datasets from other engine components to see if the same patterns hold.
- Incorporating time-series aspects of degradation might alter which method performs best.
- The findings suggest that ensemble or hybrid uncertainty methods could combine the strengths seen in individual approaches.
Load-bearing premise
The representative turbine gas temperature dataset and the chosen experimental protocol are representative enough to reveal general trade-offs that apply beyond this specific case.
What would settle it
Obtaining a different ranking of the five methods when the identical procedures are applied to temperature data recorded from a different set of engines or under altered operating conditions.
Figures
read the original abstract
Effective prognostics and health management of modern engines relies on accurate turbine gas temperature predictions and robust uncertainty quantification to ensure reliability and safety. This paper investigates five major approaches for constructing prediction intervals -- namely the Delta method, Bayesian Monte Carlo Dropout, Bootstrap method, Lower-Upper Bound Estimation, and Mean-Variance Estimation -- as a means of capturing the uncertainty in neural network predictions of turbine gas temperature. Each approach is implemented within a unified experimental framework that employs cross-validation for hyperparameter selection, repeated train-test splits for performance robustness, and multiple metrics to evaluate both the accuracy and tightness of the intervals. In particular, Coverage Probability, Normalized Mean Prediction Interval Width, and the Coverage Width-based Criterion are measured to comprehensively assess each method's reliability and sharpness. Experiments conducted on a representative turbine gas temperature dataset reveal distinct trade-offs among the five methods in terms of interval coverage, width, and stability. These findings provide a practical guide for selecting and tuning prediction interval methods in engine health management and prognostics, ensuring both interpretability and precision in real-world applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper benchmarks five uncertainty quantification methods (Delta method, Bayesian Monte Carlo Dropout, Bootstrap, Lower-Upper Bound Estimation, and Mean-Variance Estimation) for neural network predictions of turbine gas temperature degradation. It implements each within a unified experimental framework using cross-validation for hyperparameter selection, repeated train-test splits for robustness, and evaluates them via Coverage Probability, Normalized Mean Prediction Interval Width, and Coverage Width-based Criterion on a representative dataset, identifying distinct trade-offs in coverage, width, and stability to guide selection in engine health management.
Significance. If the empirical comparisons hold, the work offers a practical, reproducible guide for UQ method selection in safety-critical prognostics, where interval reliability and sharpness directly impact reliability assessments. The unified framework with repeated splits and multiple complementary metrics (PICP, NMPIW, CWC) is a clear strength, as is the focus on a domain-specific dataset; these elements support falsifiable, actionable findings rather than isolated method claims.
minor comments (3)
- [Abstract, §1] Abstract and §1: the claim of 'distinct trade-offs' would be strengthened by including one or two key quantitative results (e.g., specific PICP/NMPIW values or rankings) rather than remaining purely qualitative.
- [§3, §4] The manuscript should explicitly state the neural network architecture, training hyperparameters, and any data preprocessing steps used across all methods to ensure full reproducibility of the unified framework.
- [Figures 3-5] Figure captions and axis labels for the trade-off plots should include the exact number of repeated splits and the dataset size to allow readers to assess statistical stability without returning to the text.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report, so we have no points requiring direct response or manuscript changes at this time.
Circularity Check
No significant circularity
full rationale
The paper performs an empirical benchmarking of five standard uncertainty quantification methods (Delta, Bayesian MC Dropout, Bootstrap, LUBE, MVE) on a turbine gas temperature dataset. It applies cross-validation, repeated splits, and external metrics (coverage probability, NMPIW, CWC) without any derivations, first-principles predictions, or equations that could reduce to fitted inputs. No self-citations serve as load-bearing uniqueness theorems or ansatzes; the methods are implemented directly from the literature and evaluated against independent performance criteria. The central claims concern observed trade-offs under a unified framework and do not rely on any self-referential construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Kobayashi, D
T. Kobayashi, D. Simon, Hybrid Kalman Filter Approach for Robust Fault Detection in Gas Turbine Engines, Journal of Engineering for Gas Turbines and Power 127 (2) (2005) 497–503
2005
-
[2]
A. Saxena, K. Goebel, D. Simon, N. Eklund, Damage Propagation Modeling for Aircraft Engine Prognostics, in: Proceedings of the 2008 International Conference on Prognostics and Health Management (PHM), Denver, CO, USA, 2008, pp. 1–9. doi:10.1109/phm.2008.4711414
-
[3]
Hanachi, C
H. Hanachi, C. Mechefske, J. Liu, A. Banerjee, Y. Chen, Performance-based gas turbine health monitoring, diagnostics, and prognostics: A survey, IEEE Transactions on Reliability 67 (3) (2018) 1340–1363
2018
-
[4]
M. Baptista, S. Sankararaman, I. de Medeiros, C. Nascimento, H. Prendinger, E. Henriques, Forecasting fault events for predictive maintenance using data- driven techniques and arma modeling, Computers & Industrial Engineering 115 (2018) 41–53. doi:10.1016/j.cie.2017.10.033. URLhttps://doi.org/10.1016/j.cie.2017.10.033
-
[5]
A. Khosravi, S. Nahavandi, D. Creighton, A. F. Atiya, Comprehen- sive Review of Neural Network-Based Prediction Intervals and New Ad- vances, IEEE Transactions on Neural Networks 22 (9) (2011) 1341–1356. doi:10.1109/tnn.2011.2162110
-
[6]
Saxena, K
A. Saxena, K. Goebel, Turbofan Engine Degradation Simulation Data Set (C-MAPSS), Tech. rep., NASA Ames Prognostics Data Repository, Moffett Field, CA, USA (2008)
2008
-
[7]
J. B. Coble, Merging Data Sources to Predict Remaining Useful Life – An Automated Method to Identify Prognostic Parameters, Ph.D. thesis, University of Tennessee, Knoxville, TN, USA (2010)
2010
-
[8]
X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, Remaining useful life estimation – a review on the statistical data driven approaches, European Journal of Operational Research 213 (1) (2011) 1–14. doi:10.1016/j.ymssp.2010.07.008
-
[9]
F. O. Heimes, Recurrent neural networks for remaining useful life estimation, in: Proceedings of the 2008 International Conference on Prognostics and Health Management, IEEE, Denver, CO, USA, 2008, pp. 1–6
2008
-
[10]
A. K. S. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance, Mechanical Systems and Signal Processing 20 (7) (2006) 1483–1510
2006
-
[11]
R. B. Randall, Vibration-based condition monitoring: industrial, automotive and aerospace applications, John Wiley & Sons, 2021
2021
-
[12]
Q. Liu, X. Chen, M. Dong, F. F. Chen, A novel health prognosis method for system based on improved degenerated hidden markov model, Robotics and Computer-Integrated Manufacturing 78 (2022) 102402
2022
-
[13]
M. P. Boyce, Gas Turbine Engineering Handbook, Gulf Professional Pub- lishing, Houston, TX, USA, 2011. 50
2011
-
[14]
Jiang, S
Z. Jiang, S. Yang, X. Wang, Y. Long, An onboard adaptive model for aero-engine performance fast estimation, Aerospace 9 (12) (2022) 845
2022
-
[15]
J.-C. Han, S. Dutta, S. Ekkad, Gas turbine heat transfer and cooling technology, CRC press, 2012
2012
-
[16]
Rolls-Royce plc, The Jet Engine, Rolls-Royce Publishing, Derby, UK, 2015
2015
-
[17]
J.-s. Jung, E. Bae, G. G. Vining, C. Son, A. Rimell, R. Clarkson, A. Karl, Sensitivity of selecting training data for machine learning to predict engine performance, in: AIAA SCITECH 2023 Forum, 2023, p. 2344
2023
-
[18]
J.-s. Jung, C. Son, A. Rimell, R. J. Clarkson, A. H. Karl, Impact of data quality on predictive engine health model using machine learning, in: AIAA SCITECH 2024 Forum, 2024, p. 1131
2024
-
[19]
Vachtsevanos, F
G. Vachtsevanos, F. Lewis, M. Roemer, A. Hess, B. Wu, Intelligent Fault Diagnosis and Prognosis for Engineering Systems, Wiley, Hoboken, NJ, USA, 2006
2006
-
[20]
R. C. Smith, Uncertainty quantification: theory, implementation, and applications, SIAM, 2013
2013
-
[21]
Warner, Introduction to uncertainty quantification for modeling and simulation, in: DATAWorks: Defense and Aerospace Test and Analysis Workshop, 2023
J. Warner, Introduction to uncertainty quantification for modeling and simulation, in: DATAWorks: Defense and Aerospace Test and Analysis Workshop, 2023
2023
-
[22]
C. J. Roy, W. L. Oberkampf, A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing, Computer methods in applied mechanics and engineering 200 (25-28) (2011) 2131–2144
2011
-
[23]
Gal, Uncertainty in deep learning, Ph.D
Y. Gal, Uncertainty in deep learning, Ph.D. thesis, University of Cambridge (2016). URL https://www.cs.ox.ac.uk/people/yarin.gal/website/thesis/ thesis.pdf
2016
- [24]
-
[25]
Der Kiureghian, O
A. Der Kiureghian, O. Ditlevsen, Aleatory or Epistemic? Does It Matter?, Structural Safety 31 (2) (2009) 105–112
2009
-
[26]
D. A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press, Cambridge, UK, 2005
2005
-
[27]
Chatfield, Time-series forecasting, Chapman and Hall/CRC, 2000
C. Chatfield, Time-series forecasting, Chapman and Hall/CRC, 2000
2000
-
[28]
T. Gneiting, A. E. Raftery, Strictly proper scoring rules, prediction, and estimation, Journal of the American statistical Association 102 (477) (2007) 359–378. doi:10.1198/016214506000001437
-
[29]
H. Quan, A. Khosravi, D. Yang, D. Srinivasan, A survey of computational intelligence techniques for wind power uncertainty quantification in smart grids, IEEE transactions on neural networks and learning systems 31 (11) (2019) 4582–4599. 51
2019
-
[30]
H. D. Kabir, A. Khosravi, M. A. Hosen, S. Nahavandi, Neural network-based uncertainty quantification: A survey of methodologies and applications, IEEE access 6 (2018) 36218–36234
2018
-
[31]
Papadopoulos, P
G. Papadopoulos, P. J. Edwards, A. F. Murray, Confidence estimation methods for neural networks: A practical comparison, IEEE transactions on neural networks 12 (6) (2001) 1278–1287
2001
-
[32]
A. A. Ding, X. He, Backpropagation of pseudo-errors: neural networks that are adaptive to heterogeneous noise, IEEE Transactions on Neural Networks 14 (2) (2003) 253–262
2003
-
[33]
Dybowski, S
R. Dybowski, S. J. Roberts, Confidence intervals and prediction intervals for feed-forward neural networks, in: Clinical Applications of Artificial Neural Networks, Cambridge University Press, 2001
2001
-
[34]
V. Vovk, A. Gammerman, G. Shafer, Algorithmic Learning in a Random World, Springer, 2005. doi:10.1007/b106715. URLhttps://doi.org/10.1007/b106715
-
[35]
Shafer, V
G. Shafer, V. Vovk, A tutorial on conformal prediction, Journal of Machine Learning Research 9 (12) (2008) 371–421. URLhttps://jmlr.org/papers/v9/shafer08a.html
2008
-
[36]
Conformalized Quantile Regression
Y. Romano, E. Patterson, E. J. Cand` es, Conformalized quantile regression, in: Advances in Neural Information Processing Systems, 2019. URLhttps://arxiv.org/abs/1905.03222
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[37]
R. F. Barber, E. J. Cand` es, A. Ramdas, R. J. Tibshirani, Predictive inference with the jackknife+, The Annals of Statistics 49 (1) (2021) 486–
2021
-
[38]
doi:10.1214/20-aos1965. URL https://projecteuclid.org/journals/annals-of-statistics/ volume-49/issue-1/Predictive-inference-with-the-jackknife/10. 1214/20-AOS1965.full
-
[39]
Accurate Uncertainties for Deep Learning Using Calibrated Regression
V. Kuleshov, N. Fenner, S. Ermon, Accurate uncertainties for deep learning using calibrated regression, in: Proceedings of the 35th International Con- ference on Machine Learning, 2018. URLhttps://arxiv.org/abs/1807.00263
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
R. Koenker, G. Bassett, Regression quantiles, Econometrica 46 (1) (1978) 33–50. doi:10.2307/1913643. URLhttps://www.jstor.org/stable/1913643
-
[41]
N. Tagasovska, D. Lopez-Paz, Single-model uncertainties for deep learning, in: Advances in Neural Information Processing Systems, 2019. URLhttps://arxiv.org/abs/1811.00908
-
[42]
D. J. MacKay, The evidence framework applied to classification networks, Neural Computation 4 (5) (1991) 698–714
1991
-
[43]
L. H. Ungar, R. D. De Veaux, E. Rosengarten, Estimating prediction intervals for artificial neural networks, in: Proc. of the 9th yale workshop on adaptive and learning systems, Citeseer, 1996. 52
1996
-
[44]
H. Quan, D. Srinivasan, A. Khosravi, Incorporating wind power forecast uncertainties into stochastic unit commitment using neural network-based prediction intervals, IEEE transactions on neural networks and learning systems 26 (9) (2014) 2123–2135
2014
-
[45]
H. Quan, D. Srinivasan, A. Khosravi, Particle swarm optimization for construction of neural network-based prediction intervals, Neurocomputing 127 (2014) 172–180
2014
-
[46]
K. S. Kasiviswanathan, K. Sudheer, Comparison of methods used for quan- tifying prediction interval in artificial neural network hydrologic models, Modeling Earth Systems and Environment 2 (2016) 1–11
2016
-
[47]
Kendall, Y
A. Kendall, Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, Advances in neural information processing systems 30 (2017)
2017
-
[48]
Heskes, Practical confidence and prediction intervals, Advances in neural information processing systems 9 (1996)
T. Heskes, Practical confidence and prediction intervals, Advances in neural information processing systems 9 (1996)
1996
-
[49]
Tibshirani, A comparison of some error estimates for neural network models, Neural computation 8 (1) (1996) 152–163
R. Tibshirani, A comparison of some error estimates for neural network models, Neural computation 8 (1) (1996) 152–163
1996
-
[50]
J. G. Carney, P. Cunningham, U. Bhagwan, Confidence and prediction intervals for neural network ensembles, in: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), Vol. 2, IEEE, 1999, pp. 1215–1218
1999
-
[51]
E. Zio, A study of the bootstrap method for estimating the accuracy of artificial neural networks in predicting nuclear transient processes, IEEE Transactions on Nuclear Science 53 (3) (2006) 1460–1478
2006
-
[52]
Giordano, M
F. Giordano, M. La Rocca, C. Perna, Forecasting nonlinear time series with neural network sieve bootstrap, Computational Statistics & Data Analysis 51 (8) (2007) 3871–3884
2007
-
[53]
Kumar, A
S. Kumar, A. N. Srivistava, Bootstrap prediction intervals in non-parametric regression with applications to anomaly detection, in: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, no. ARC-E-DAA-TN6188, 2012
2012
-
[54]
Khosravi, S
A. Khosravi, S. Nahavandi, D. Srinivasan, R. Khosravi, Constructing optimal prediction intervals by using neural networks and bootstrap method, IEEE transactions on neural networks and learning systems 26 (8) (2014) 1810– 1815
2014
-
[55]
Errouissi, J
R. Errouissi, J. Cardenas-Barrera, J. Meng, E. Castillo-Guerra, X. Gong, L. Chang, Bootstrap prediction interval estimation for wind speed forecast- ing, in: 2015 IEEE Energy Conversion Congress and Exposition (ECCE), IEEE, 2015, pp. 1919–1924
2015
-
[56]
J. G. Hwang, A. A. Ding, Prediction intervals for artificial neural networks, Journal of the American Statistical Association 92 (438) (1997) 748–757. 53
1997
-
[57]
R. D. De VlEAUX, J. Schumi, J. Schweinsberg, L. H. Ungar, Prediction intervals for neural networks via nonlinear regression, Technometrics 40 (4) (1998) 273–282
1998
-
[58]
S. L. Ho, M. Xie, L. Tang, K. Xu, T. Goh, Neural network modeling with confidence bounds: a case study on the solder paste deposition process, IEEE transactions on electronics packaging manufacturing 24 (4) (2001) 323–332
2001
-
[59]
Khosravi, S
A. Khosravi, S. Nahavandi, D. Creighton, Improving prediction interval quality: A genetic algorithm-based method applied to neural networks, in: Neural Information Processing: 16th International Conference, ICONIP 2009, Bangkok, Thailand, December 1-5, 2009, Proceedings, Part II 16, Springer, 2009, pp. 141–149
2009
-
[60]
T. Lu, M. Viljanen, Prediction of indoor temperature and relative humidity using neural network models: model comparison, Neural Computing and Applications 18 (2009) 345–357
2009
-
[61]
W. Wu, K. Chen, Y. Qiao, Z. Lu, Probabilistic short-term wind power forecasting based on deep neural networks, in: 2016 international conference on probabilistic methods applied to power systems (PMAPS), IEEE, 2016, pp. 1–8
2016
-
[62]
Khosravi, S
A. Khosravi, S. Nahavandi, D. Creighton, A. F. Atiya, Lower upper bound estimation method for construction of neural network-based prediction intervals, IEEE transactions on neural networks 22 (3) (2010) 337–346
2010
-
[63]
H. Quan, D. Srinivasan, A. Khosravi, Short-term load and wind power fore- casting using neural network-based prediction intervals, IEEE transactions on neural networks and learning systems 25 (2) (2013) 303–315
2013
-
[64]
C. Wan, Z. Xu, P. Pinson, Direct interval forecasting of wind power, IEEE Transactions on Power Systems 28 (4) (2013) 4877–4878
2013
-
[65]
C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, K. P. Wong, Optimal prediction intervals of wind power generation, IEEE Transactions on Power Systems 29 (3) (2013) 1166–1174
2013
-
[66]
Zhang, Y
G. Zhang, Y. Wu, K. P. Wong, Z. Xu, Z. Y. Dong, H. H.-C. Iu, An advanced approach for construction of optimal wind power prediction intervals, IEEE transactions on power systems 30 (5) (2014) 2706–2715
2014
-
[67]
M. A. Hosen, A. Khosravi, S. Nahavandi, D. Creighton, Prediction interval- based neural network modelling of polystyrene polymerization reactor–a new perspective of data-based modelling, Chemical Engineering Research and Design 92 (11) (2014) 2041–2051
2014
-
[68]
Khosravi, S
A. Khosravi, S. Nahavandi, D. Creighton, A prediction interval-based approach to determine optimal structures of neural network metamodels, Expert systems with applications 37 (3) (2010) 2377–2387. 54
2010
-
[69]
D. A. Nix, A. S. Weigend, Estimating the mean and variance of the target probability distribution, in: Proceedings of the IEEE International Con- ference on Neural Networks (ICNN’94), Vol. 1, IEEE, 1994, pp. 55–60. doi:10.1109/icnn.1994.374138
-
[70]
Khosravi, S
A. Khosravi, S. Nahavandi, An optimized mean variance estimation method for uncertainty quantification of wind power forecasts, International Journal of Electrical Power & Energy Systems 61 (2014) 446–454
2014
-
[71]
Pearce, A
T. Pearce, A. Brintrup, M. Zaki, A. Neely, High-quality prediction intervals for deep learning: A distribution-free, ensembled approach, in: International conference on machine learning, PMLR, 2018, pp. 4075–4084
2018
-
[72]
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predic- tive uncertainty estimation using deep ensembles, in: Advances in Neural Information Processing Systems, Vol. 30, 2017. URLhttps://arxiv.org/abs/1612.01474
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[73]
Y. Gal, Z. Ghahramani, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, in: Proc. International Conference on Machine Learning (ICML), PMLR, 2016, pp. 1050–1059
2016
-
[74]
A. W. Van der Vaart, Asymptotic statistics, Vol. 3, Cambridge university press, 2000. 55
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.