pith. sign in

arxiv: 2605.30585 · v1 · pith:NPEPEZ3Xnew · submitted 2026-05-28 · 💻 cs.LG · cs.AI· cs.CE

Benchmarking Machine Learning Uncertainty Quantification Methodologies for Predicting Turbine Gas Temperature Degradation

Pith reviewed 2026-06-29 08:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CE
keywords uncertainty quantificationprediction intervalsneural networksturbine gas temperatureengine prognosticsbenchmarkingmachine learningdegradation prediction
0
0 comments X

The pith

Experiments on turbine gas temperature data show five uncertainty methods exhibit distinct trade-offs in coverage, width, and stability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to compare five standard techniques for producing prediction intervals around neural network forecasts of turbine gas temperature. It applies each technique inside the same testing setup that uses cross-validation and repeated data splits to measure how often the intervals contain the actual values and how wide they are. A reader would care because accurate uncertainty estimates help decide when an engine needs maintenance without unnecessary downtime or risk. The work finds that the methods differ in their balance of reliability and precision, giving practitioners a way to choose based on their needs for safety or efficiency.

Core claim

When the five methods—Delta, Bayesian Monte Carlo Dropout, Bootstrap, Lower-Upper Bound Estimation, and Mean-Variance Estimation—are evaluated on the turbine gas temperature dataset using coverage probability, normalized mean prediction interval width, and coverage width-based criterion within a unified framework of cross-validation and repeated splits, each method displays a unique pattern of strengths and weaknesses across those measures.

What carries the argument

A unified experimental framework consisting of cross-validation for hyperparameter selection, repeated train-test splits for robustness, and three evaluation metrics applied to the five interval construction approaches.

If this is right

  • The choice of method affects both the safety margin and the tightness of predictions in engine health management.
  • Stability across data splits varies, influencing how dependable the uncertainty estimates are in repeated use.
  • A practical selection guide emerges for balancing interpretability and precision in real-world prognostics applications.
  • Different priorities, such as high coverage for critical safety versus narrow intervals for operational efficiency, lead to different preferred methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed trade-offs could be tested on datasets from other engine components to see if the same patterns hold.
  • Incorporating time-series aspects of degradation might alter which method performs best.
  • The findings suggest that ensemble or hybrid uncertainty methods could combine the strengths seen in individual approaches.

Load-bearing premise

The representative turbine gas temperature dataset and the chosen experimental protocol are representative enough to reveal general trade-offs that apply beyond this specific case.

What would settle it

Obtaining a different ranking of the five methods when the identical procedures are applied to temperature data recorded from a different set of engines or under altered operating conditions.

Figures

Figures reproduced from arXiv: 2605.30585 by Adrian Sandu, Andrew Rimell, Changmin Son, Gavan Burke, James G. Steinrock, Jostein Barry-Straume, Rekha Sundararajan.

Figure 1
Figure 1. Figure 1: Experimental workflow for training, tuning, and evaluating TGT [PITH_FULL_IMAGE:figures/full_fig_p029_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall comparison of the mean absolute TGT prediction error for the [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall comparison of the standard deviation of absolute TGT predic [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overall comparison of the mean prediction interval width (MPIW) for [PITH_FULL_IMAGE:figures/full_fig_p034_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overall comparison of the normalized mean prediction interval width [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overall comparison of the standard deviation of prediction interval [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Overall comparison of the prediction interval coverage probability [PITH_FULL_IMAGE:figures/full_fig_p035_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Overall comparison of the coverage width-based criterion (CWC) for [PITH_FULL_IMAGE:figures/full_fig_p036_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Mean absolute TGT prediction error (MAE) by uncertainty quantifi [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Standard deviation of TGT prediction error (absolute error) by [PITH_FULL_IMAGE:figures/full_fig_p037_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean prediction interval width (MPIW) of TGT prediction intervals [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Standard deviation of TGT prediction interval width by uncertainty [PITH_FULL_IMAGE:figures/full_fig_p038_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Normalized mean prediction interval width (NMPIW) by uncertainty [PITH_FULL_IMAGE:figures/full_fig_p038_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Prediction interval coverage probability (PICP) by uncertainty quan [PITH_FULL_IMAGE:figures/full_fig_p039_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Mean absolute TGT prediction error (MAE) by uncertainty quantifi [PITH_FULL_IMAGE:figures/full_fig_p039_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Standard deviation of TGT prediction error (absolute error) by [PITH_FULL_IMAGE:figures/full_fig_p040_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Mean prediction interval width (MPIW) of TGT prediction intervals [PITH_FULL_IMAGE:figures/full_fig_p040_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Normalized mean prediction interval width (NMPIW) by uncertainty [PITH_FULL_IMAGE:figures/full_fig_p041_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Prediction interval coverage probability (PICP) by uncertainty quan [PITH_FULL_IMAGE:figures/full_fig_p041_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Takeoff Phase. Comparison of the mean absolute TGT prediction [PITH_FULL_IMAGE:figures/full_fig_p042_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Takeoff Phase. Comparison of the standard deviation of absolute [PITH_FULL_IMAGE:figures/full_fig_p042_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Takeoff Phase. Comparison of the mean prediction interval width [PITH_FULL_IMAGE:figures/full_fig_p042_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Takeoff Phase. Comparison of the normalized mean prediction interval [PITH_FULL_IMAGE:figures/full_fig_p043_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Takeoff Phase. Comparison of the prediction interval coverage [PITH_FULL_IMAGE:figures/full_fig_p043_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Takeoff Phase. Comparison of the coverage width-based criterion [PITH_FULL_IMAGE:figures/full_fig_p044_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Climb Phase. Comparison of the mean absolute TGT prediction [PITH_FULL_IMAGE:figures/full_fig_p044_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Climb Phase. Comparison of the standard deviation of absolute TGT [PITH_FULL_IMAGE:figures/full_fig_p045_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Climb Phase. Comparison of the mean prediction interval width [PITH_FULL_IMAGE:figures/full_fig_p045_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Climb Phase. Comparison of the normalized mean prediction interval [PITH_FULL_IMAGE:figures/full_fig_p045_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Climb Phase. Comparison of the prediction interval coverage proba [PITH_FULL_IMAGE:figures/full_fig_p046_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Climb Phase. Comparison of the coverage width-based criterion [PITH_FULL_IMAGE:figures/full_fig_p046_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Cruise Phase. Comparison of the mean absolute TGT prediction [PITH_FULL_IMAGE:figures/full_fig_p047_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Cruise Phase. Comparison of the standard deviation of absolute TGT [PITH_FULL_IMAGE:figures/full_fig_p047_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Cruise Phase. Comparison of the mean prediction interval width [PITH_FULL_IMAGE:figures/full_fig_p047_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Cruise Phase. Comparison of the normalized mean prediction interval [PITH_FULL_IMAGE:figures/full_fig_p048_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Cruise Phase. Comparison of the prediction interval coverage proba [PITH_FULL_IMAGE:figures/full_fig_p048_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Cruise Phase. Comparison of the coverage width-based criterion [PITH_FULL_IMAGE:figures/full_fig_p048_37.png] view at source ↗
read the original abstract

Effective prognostics and health management of modern engines relies on accurate turbine gas temperature predictions and robust uncertainty quantification to ensure reliability and safety. This paper investigates five major approaches for constructing prediction intervals -- namely the Delta method, Bayesian Monte Carlo Dropout, Bootstrap method, Lower-Upper Bound Estimation, and Mean-Variance Estimation -- as a means of capturing the uncertainty in neural network predictions of turbine gas temperature. Each approach is implemented within a unified experimental framework that employs cross-validation for hyperparameter selection, repeated train-test splits for performance robustness, and multiple metrics to evaluate both the accuracy and tightness of the intervals. In particular, Coverage Probability, Normalized Mean Prediction Interval Width, and the Coverage Width-based Criterion are measured to comprehensively assess each method's reliability and sharpness. Experiments conducted on a representative turbine gas temperature dataset reveal distinct trade-offs among the five methods in terms of interval coverage, width, and stability. These findings provide a practical guide for selecting and tuning prediction interval methods in engine health management and prognostics, ensuring both interpretability and precision in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper benchmarks five uncertainty quantification methods (Delta method, Bayesian Monte Carlo Dropout, Bootstrap, Lower-Upper Bound Estimation, and Mean-Variance Estimation) for neural network predictions of turbine gas temperature degradation. It implements each within a unified experimental framework using cross-validation for hyperparameter selection, repeated train-test splits for robustness, and evaluates them via Coverage Probability, Normalized Mean Prediction Interval Width, and Coverage Width-based Criterion on a representative dataset, identifying distinct trade-offs in coverage, width, and stability to guide selection in engine health management.

Significance. If the empirical comparisons hold, the work offers a practical, reproducible guide for UQ method selection in safety-critical prognostics, where interval reliability and sharpness directly impact reliability assessments. The unified framework with repeated splits and multiple complementary metrics (PICP, NMPIW, CWC) is a clear strength, as is the focus on a domain-specific dataset; these elements support falsifiable, actionable findings rather than isolated method claims.

minor comments (3)
  1. [Abstract, §1] Abstract and §1: the claim of 'distinct trade-offs' would be strengthened by including one or two key quantitative results (e.g., specific PICP/NMPIW values or rankings) rather than remaining purely qualitative.
  2. [§3, §4] The manuscript should explicitly state the neural network architecture, training hyperparameters, and any data preprocessing steps used across all methods to ensure full reproducibility of the unified framework.
  3. [Figures 3-5] Figure captions and axis labels for the trade-off plots should include the exact number of repeated splits and the dataset size to allow readers to assess statistical stability without returning to the text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report, so we have no points requiring direct response or manuscript changes at this time.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper performs an empirical benchmarking of five standard uncertainty quantification methods (Delta, Bayesian MC Dropout, Bootstrap, LUBE, MVE) on a turbine gas temperature dataset. It applies cross-validation, repeated splits, and external metrics (coverage probability, NMPIW, CWC) without any derivations, first-principles predictions, or equations that could reduce to fitted inputs. No self-citations serve as load-bearing uniqueness theorems or ansatzes; the methods are implemented directly from the literature and evaluated against independent performance criteria. The central claims concern observed trade-offs under a unified framework and do not rely on any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities appear in the abstract; the work is purely empirical benchmarking of known techniques.

pith-pipeline@v0.9.1-grok · 5742 in / 1038 out tokens · 25757 ms · 2026-06-29T08:31:31.160332+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 14 canonical work pages · 3 internal anchors

  1. [1]

    Kobayashi, D

    T. Kobayashi, D. Simon, Hybrid Kalman Filter Approach for Robust Fault Detection in Gas Turbine Engines, Journal of Engineering for Gas Turbines and Power 127 (2) (2005) 497–503

  2. [2]

    Saxena, K

    A. Saxena, K. Goebel, D. Simon, N. Eklund, Damage Propagation Modeling for Aircraft Engine Prognostics, in: Proceedings of the 2008 International Conference on Prognostics and Health Management (PHM), Denver, CO, USA, 2008, pp. 1–9. doi:10.1109/phm.2008.4711414

  3. [3]

    Hanachi, C

    H. Hanachi, C. Mechefske, J. Liu, A. Banerjee, Y. Chen, Performance-based gas turbine health monitoring, diagnostics, and prognostics: A survey, IEEE Transactions on Reliability 67 (3) (2018) 1340–1363

  4. [4]

    Baptista, S

    M. Baptista, S. Sankararaman, I. de Medeiros, C. Nascimento, H. Prendinger, E. Henriques, Forecasting fault events for predictive maintenance using data- driven techniques and arma modeling, Computers & Industrial Engineering 115 (2018) 41–53. doi:10.1016/j.cie.2017.10.033. URLhttps://doi.org/10.1016/j.cie.2017.10.033

  5. [5]

    Khosravi, S

    A. Khosravi, S. Nahavandi, D. Creighton, A. F. Atiya, Comprehen- sive Review of Neural Network-Based Prediction Intervals and New Ad- vances, IEEE Transactions on Neural Networks 22 (9) (2011) 1341–1356. doi:10.1109/tnn.2011.2162110

  6. [6]

    Saxena, K

    A. Saxena, K. Goebel, Turbofan Engine Degradation Simulation Data Set (C-MAPSS), Tech. rep., NASA Ames Prognostics Data Repository, Moffett Field, CA, USA (2008)

  7. [7]

    J. B. Coble, Merging Data Sources to Predict Remaining Useful Life – An Automated Method to Identify Prognostic Parameters, Ph.D. thesis, University of Tennessee, Knoxville, TN, USA (2010)

  8. [8]

    X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, Remaining useful life estimation – a review on the statistical data driven approaches, European Journal of Operational Research 213 (1) (2011) 1–14. doi:10.1016/j.ymssp.2010.07.008

  9. [9]

    F. O. Heimes, Recurrent neural networks for remaining useful life estimation, in: Proceedings of the 2008 International Conference on Prognostics and Health Management, IEEE, Denver, CO, USA, 2008, pp. 1–6

  10. [10]

    A. K. S. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and prognostics implementing condition-based maintenance, Mechanical Systems and Signal Processing 20 (7) (2006) 1483–1510

  11. [11]

    R. B. Randall, Vibration-based condition monitoring: industrial, automotive and aerospace applications, John Wiley & Sons, 2021

  12. [12]

    Q. Liu, X. Chen, M. Dong, F. F. Chen, A novel health prognosis method for system based on improved degenerated hidden markov model, Robotics and Computer-Integrated Manufacturing 78 (2022) 102402

  13. [13]

    M. P. Boyce, Gas Turbine Engineering Handbook, Gulf Professional Pub- lishing, Houston, TX, USA, 2011. 50

  14. [14]

    Jiang, S

    Z. Jiang, S. Yang, X. Wang, Y. Long, An onboard adaptive model for aero-engine performance fast estimation, Aerospace 9 (12) (2022) 845

  15. [15]

    J.-C. Han, S. Dutta, S. Ekkad, Gas turbine heat transfer and cooling technology, CRC press, 2012

  16. [16]

    Rolls-Royce plc, The Jet Engine, Rolls-Royce Publishing, Derby, UK, 2015

  17. [17]

    J.-s. Jung, E. Bae, G. G. Vining, C. Son, A. Rimell, R. Clarkson, A. Karl, Sensitivity of selecting training data for machine learning to predict engine performance, in: AIAA SCITECH 2023 Forum, 2023, p. 2344

  18. [18]

    J.-s. Jung, C. Son, A. Rimell, R. J. Clarkson, A. H. Karl, Impact of data quality on predictive engine health model using machine learning, in: AIAA SCITECH 2024 Forum, 2024, p. 1131

  19. [19]

    Vachtsevanos, F

    G. Vachtsevanos, F. Lewis, M. Roemer, A. Hess, B. Wu, Intelligent Fault Diagnosis and Prognosis for Engineering Systems, Wiley, Hoboken, NJ, USA, 2006

  20. [20]

    R. C. Smith, Uncertainty quantification: theory, implementation, and applications, SIAM, 2013

  21. [21]

    Warner, Introduction to uncertainty quantification for modeling and simulation, in: DATAWorks: Defense and Aerospace Test and Analysis Workshop, 2023

    J. Warner, Introduction to uncertainty quantification for modeling and simulation, in: DATAWorks: Defense and Aerospace Test and Analysis Workshop, 2023

  22. [22]

    C. J. Roy, W. L. Oberkampf, A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing, Computer methods in applied mechanics and engineering 200 (25-28) (2011) 2131–2144

  23. [23]

    Gal, Uncertainty in deep learning, Ph.D

    Y. Gal, Uncertainty in deep learning, Ph.D. thesis, University of Cambridge (2016). URL https://www.cs.ox.ac.uk/people/yarin.gal/website/thesis/ thesis.pdf

  24. [24]

    W. He, Z. Jiang, A survey on uncertainty quantification methods for deep neural networks: An uncertainty source perspective, arXiv preprint arXiv:2302.13425 (2023)

  25. [25]

    Der Kiureghian, O

    A. Der Kiureghian, O. Ditlevsen, Aleatory or Epistemic? Does It Matter?, Structural Safety 31 (2) (2009) 105–112

  26. [26]

    D. A. Freedman, Statistical Models: Theory and Practice, Cambridge University Press, Cambridge, UK, 2005

  27. [27]

    Chatfield, Time-series forecasting, Chapman and Hall/CRC, 2000

    C. Chatfield, Time-series forecasting, Chapman and Hall/CRC, 2000

  28. [28]

    Gneiting, A

    T. Gneiting, A. E. Raftery, Strictly proper scoring rules, prediction, and estimation, Journal of the American statistical Association 102 (477) (2007) 359–378. doi:10.1198/016214506000001437

  29. [29]

    H. Quan, A. Khosravi, D. Yang, D. Srinivasan, A survey of computational intelligence techniques for wind power uncertainty quantification in smart grids, IEEE transactions on neural networks and learning systems 31 (11) (2019) 4582–4599. 51

  30. [30]

    H. D. Kabir, A. Khosravi, M. A. Hosen, S. Nahavandi, Neural network-based uncertainty quantification: A survey of methodologies and applications, IEEE access 6 (2018) 36218–36234

  31. [31]

    Papadopoulos, P

    G. Papadopoulos, P. J. Edwards, A. F. Murray, Confidence estimation methods for neural networks: A practical comparison, IEEE transactions on neural networks 12 (6) (2001) 1278–1287

  32. [32]

    A. A. Ding, X. He, Backpropagation of pseudo-errors: neural networks that are adaptive to heterogeneous noise, IEEE Transactions on Neural Networks 14 (2) (2003) 253–262

  33. [33]

    Dybowski, S

    R. Dybowski, S. J. Roberts, Confidence intervals and prediction intervals for feed-forward neural networks, in: Clinical Applications of Artificial Neural Networks, Cambridge University Press, 2001

  34. [34]

    V. Vovk, A. Gammerman, G. Shafer, Algorithmic Learning in a Random World, Springer, 2005. doi:10.1007/b106715. URLhttps://doi.org/10.1007/b106715

  35. [35]

    Shafer, V

    G. Shafer, V. Vovk, A tutorial on conformal prediction, Journal of Machine Learning Research 9 (12) (2008) 371–421. URLhttps://jmlr.org/papers/v9/shafer08a.html

  36. [36]

    Conformalized Quantile Regression

    Y. Romano, E. Patterson, E. J. Cand` es, Conformalized quantile regression, in: Advances in Neural Information Processing Systems, 2019. URLhttps://arxiv.org/abs/1905.03222

  37. [37]

    R. F. Barber, E. J. Cand` es, A. Ramdas, R. J. Tibshirani, Predictive inference with the jackknife+, The Annals of Statistics 49 (1) (2021) 486–

  38. [38]

    URL https://projecteuclid.org/journals/annals-of-statistics/ volume-49/issue-1/Predictive-inference-with-the-jackknife/10

    doi:10.1214/20-aos1965. URL https://projecteuclid.org/journals/annals-of-statistics/ volume-49/issue-1/Predictive-inference-with-the-jackknife/10. 1214/20-AOS1965.full

  39. [39]

    Accurate Uncertainties for Deep Learning Using Calibrated Regression

    V. Kuleshov, N. Fenner, S. Ermon, Accurate uncertainties for deep learning using calibrated regression, in: Proceedings of the 35th International Con- ference on Machine Learning, 2018. URLhttps://arxiv.org/abs/1807.00263

  40. [40]

    Koenker, G

    R. Koenker, G. Bassett, Regression quantiles, Econometrica 46 (1) (1978) 33–50. doi:10.2307/1913643. URLhttps://www.jstor.org/stable/1913643

  41. [41]

    Tagasovska, D

    N. Tagasovska, D. Lopez-Paz, Single-model uncertainties for deep learning, in: Advances in Neural Information Processing Systems, 2019. URLhttps://arxiv.org/abs/1811.00908

  42. [42]

    D. J. MacKay, The evidence framework applied to classification networks, Neural Computation 4 (5) (1991) 698–714

  43. [43]

    L. H. Ungar, R. D. De Veaux, E. Rosengarten, Estimating prediction intervals for artificial neural networks, in: Proc. of the 9th yale workshop on adaptive and learning systems, Citeseer, 1996. 52

  44. [44]

    H. Quan, D. Srinivasan, A. Khosravi, Incorporating wind power forecast uncertainties into stochastic unit commitment using neural network-based prediction intervals, IEEE transactions on neural networks and learning systems 26 (9) (2014) 2123–2135

  45. [45]

    H. Quan, D. Srinivasan, A. Khosravi, Particle swarm optimization for construction of neural network-based prediction intervals, Neurocomputing 127 (2014) 172–180

  46. [46]

    K. S. Kasiviswanathan, K. Sudheer, Comparison of methods used for quan- tifying prediction interval in artificial neural network hydrologic models, Modeling Earth Systems and Environment 2 (2016) 1–11

  47. [47]

    Kendall, Y

    A. Kendall, Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, Advances in neural information processing systems 30 (2017)

  48. [48]

    Heskes, Practical confidence and prediction intervals, Advances in neural information processing systems 9 (1996)

    T. Heskes, Practical confidence and prediction intervals, Advances in neural information processing systems 9 (1996)

  49. [49]

    Tibshirani, A comparison of some error estimates for neural network models, Neural computation 8 (1) (1996) 152–163

    R. Tibshirani, A comparison of some error estimates for neural network models, Neural computation 8 (1) (1996) 152–163

  50. [50]

    J. G. Carney, P. Cunningham, U. Bhagwan, Confidence and prediction intervals for neural network ensembles, in: IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), Vol. 2, IEEE, 1999, pp. 1215–1218

  51. [51]

    E. Zio, A study of the bootstrap method for estimating the accuracy of artificial neural networks in predicting nuclear transient processes, IEEE Transactions on Nuclear Science 53 (3) (2006) 1460–1478

  52. [52]

    Giordano, M

    F. Giordano, M. La Rocca, C. Perna, Forecasting nonlinear time series with neural network sieve bootstrap, Computational Statistics & Data Analysis 51 (8) (2007) 3871–3884

  53. [53]

    Kumar, A

    S. Kumar, A. N. Srivistava, Bootstrap prediction intervals in non-parametric regression with applications to anomaly detection, in: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, no. ARC-E-DAA-TN6188, 2012

  54. [54]

    Khosravi, S

    A. Khosravi, S. Nahavandi, D. Srinivasan, R. Khosravi, Constructing optimal prediction intervals by using neural networks and bootstrap method, IEEE transactions on neural networks and learning systems 26 (8) (2014) 1810– 1815

  55. [55]

    Errouissi, J

    R. Errouissi, J. Cardenas-Barrera, J. Meng, E. Castillo-Guerra, X. Gong, L. Chang, Bootstrap prediction interval estimation for wind speed forecast- ing, in: 2015 IEEE Energy Conversion Congress and Exposition (ECCE), IEEE, 2015, pp. 1919–1924

  56. [56]

    J. G. Hwang, A. A. Ding, Prediction intervals for artificial neural networks, Journal of the American Statistical Association 92 (438) (1997) 748–757. 53

  57. [57]

    R. D. De VlEAUX, J. Schumi, J. Schweinsberg, L. H. Ungar, Prediction intervals for neural networks via nonlinear regression, Technometrics 40 (4) (1998) 273–282

  58. [58]

    S. L. Ho, M. Xie, L. Tang, K. Xu, T. Goh, Neural network modeling with confidence bounds: a case study on the solder paste deposition process, IEEE transactions on electronics packaging manufacturing 24 (4) (2001) 323–332

  59. [59]

    Khosravi, S

    A. Khosravi, S. Nahavandi, D. Creighton, Improving prediction interval quality: A genetic algorithm-based method applied to neural networks, in: Neural Information Processing: 16th International Conference, ICONIP 2009, Bangkok, Thailand, December 1-5, 2009, Proceedings, Part II 16, Springer, 2009, pp. 141–149

  60. [60]

    T. Lu, M. Viljanen, Prediction of indoor temperature and relative humidity using neural network models: model comparison, Neural Computing and Applications 18 (2009) 345–357

  61. [61]

    W. Wu, K. Chen, Y. Qiao, Z. Lu, Probabilistic short-term wind power forecasting based on deep neural networks, in: 2016 international conference on probabilistic methods applied to power systems (PMAPS), IEEE, 2016, pp. 1–8

  62. [62]

    Khosravi, S

    A. Khosravi, S. Nahavandi, D. Creighton, A. F. Atiya, Lower upper bound estimation method for construction of neural network-based prediction intervals, IEEE transactions on neural networks 22 (3) (2010) 337–346

  63. [63]

    H. Quan, D. Srinivasan, A. Khosravi, Short-term load and wind power fore- casting using neural network-based prediction intervals, IEEE transactions on neural networks and learning systems 25 (2) (2013) 303–315

  64. [64]

    C. Wan, Z. Xu, P. Pinson, Direct interval forecasting of wind power, IEEE Transactions on Power Systems 28 (4) (2013) 4877–4878

  65. [65]

    C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, K. P. Wong, Optimal prediction intervals of wind power generation, IEEE Transactions on Power Systems 29 (3) (2013) 1166–1174

  66. [66]

    Zhang, Y

    G. Zhang, Y. Wu, K. P. Wong, Z. Xu, Z. Y. Dong, H. H.-C. Iu, An advanced approach for construction of optimal wind power prediction intervals, IEEE transactions on power systems 30 (5) (2014) 2706–2715

  67. [67]

    M. A. Hosen, A. Khosravi, S. Nahavandi, D. Creighton, Prediction interval- based neural network modelling of polystyrene polymerization reactor–a new perspective of data-based modelling, Chemical Engineering Research and Design 92 (11) (2014) 2041–2051

  68. [68]

    Khosravi, S

    A. Khosravi, S. Nahavandi, D. Creighton, A prediction interval-based approach to determine optimal structures of neural network metamodels, Expert systems with applications 37 (3) (2010) 2377–2387. 54

  69. [69]

    D. A. Nix, A. S. Weigend, Estimating the mean and variance of the target probability distribution, in: Proceedings of the IEEE International Con- ference on Neural Networks (ICNN’94), Vol. 1, IEEE, 1994, pp. 55–60. doi:10.1109/icnn.1994.374138

  70. [70]

    Khosravi, S

    A. Khosravi, S. Nahavandi, An optimized mean variance estimation method for uncertainty quantification of wind power forecasts, International Journal of Electrical Power & Energy Systems 61 (2014) 446–454

  71. [71]

    Pearce, A

    T. Pearce, A. Brintrup, M. Zaki, A. Neely, High-quality prediction intervals for deep learning: A distribution-free, ensembled approach, in: International conference on machine learning, PMLR, 2018, pp. 4075–4084

  72. [72]

    Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

    B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predic- tive uncertainty estimation using deep ensembles, in: Advances in Neural Information Processing Systems, Vol. 30, 2017. URLhttps://arxiv.org/abs/1612.01474

  73. [73]

    Y. Gal, Z. Ghahramani, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, in: Proc. International Conference on Machine Learning (ICML), PMLR, 2016, pp. 1050–1059

  74. [74]

    A. W. Van der Vaart, Asymptotic statistics, Vol. 3, Cambridge university press, 2000. 55