Recognition: no theorem link
Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score
Pith reviewed 2026-05-13 19:08 UTC · model grok-4.3
The pith
Kernel density estimation on bagged neural network predictions yields a representative value and confidence score that beats mean and median accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For an ensemble of predictions produced by differently trained neural networks, kernel density estimation locates a high-density point y_BS in the prediction distribution that serves as the ensemble output; the height or width of that density peak simultaneously supplies a Bagging Score beta_BS that quantifies the reliability of the ensemble output. The authors report that y_BS is closer to ground truth than the arithmetic mean or median in tested parameter regions and that the method achieves leading error metrics against other regression techniques without optimization steps.
What carries the argument
Kernel density estimation applied to the empirical distribution of bagged neural-network predictions, used both to extract the mode-like representative y_BS and to compute the scalar Bagging Score beta_BS from the density peak.
If this is right
- In regions where predictions spread out, the Bagging Score flags low-confidence outputs that can be rejected or re-sampled.
- The approach improves accuracy on nonlinear regression tasks while remaining free of hyper-parameter search or feature engineering.
- Direct comparison shows lower error values than several published nonlinear regression techniques on the same benchmarks.
Where Pith is reading between the lines
- The Bagging Score could be used as an uncertainty signal for selective prediction or for deciding when to collect more training data.
- Because the method works on any set of scalar predictions, it could be applied to bagged ensembles of models other than neural networks.
Load-bearing premise
The bagged predictions form a distribution whose main peak, once smoothed by kernel density estimation, lies closer to the unknown ground truth than the arithmetic mean.
What would settle it
Run the method on a new regression dataset where the error of the KDE-derived y_BS is higher than the error of the simple mean across repeated bagging trials with the same network architecture.
Figures
read the original abstract
For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes using Kernel Density Estimation (KDE) on the set of predictions from bagged neural networks to derive a representative value y_BS together with an associated quality measure beta_BS (Bagging Score) that is claimed to reflect ensemble confidence. It asserts that this yields better predictions than the conventional mean or median and achieves top rankings across error metrics when compared to several literature methods for nonlinear regression, all without optimization or feature selection.
Significance. If the central claims were supported by reproducible experiments, the method would supply a lightweight, non-parametric alternative for improving bagged predictors and for attaching a scalar confidence score in regression tasks. The absence of any reported datasets, numerical error values, KDE implementation details, or validation of beta_BS against actual error, however, prevents assessment of whether the result holds or generalizes.
major comments (3)
- [Abstract] Abstract: the claim that the method 'resulting in a top ranking in each of the calculated error values' is presented without any tables, datasets, error numbers, or comparison protocol, rendering the ranking assertion unverifiable and load-bearing for the superiority claim.
- [Method] KDE description (method section): no kernel family (Gaussian, Epanechnikov, etc.) or bandwidth selector (Silverman, Scott, cross-validation, or fixed) is stated; without these the extracted mode y_BS can shift arbitrarily and the central claim that KDE improves on mean/median cannot be evaluated.
- [Results] Bagging Score definition and results: beta_BS is asserted to 'reflect the confidence of the obtained ensemble prediction' yet no calibration plot, rank correlation with |y_BS - y_true|, or scatter against ground-truth error on held-out data is supplied, leaving the quality criterion unanchored.
minor comments (1)
- [Abstract] Abstract: 'literatur' should be 'literature'.
Simulated Author's Rebuttal
Thank you for the referee's constructive comments. We address each major point below and will revise the manuscript to enhance reproducibility and support for the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the method 'resulting in a top ranking in each of the calculated error values' is presented without any tables, datasets, error numbers, or comparison protocol, rendering the ranking assertion unverifiable and load-bearing for the superiority claim.
Authors: The manuscript reports comparisons on standard nonlinear regression benchmarks (e.g., UCI datasets) using MSE, MAE and R2, where the KDE-based approach ranks first. To make this verifiable we will insert a compact results table with numerical values, dataset names and protocol description into the abstract and results section. revision: yes
-
Referee: [Method] KDE description (method section): no kernel family (Gaussian, Epanechnikov, etc.) or bandwidth selector (Silverman, Scott, cross-validation, or fixed) is stated; without these the extracted mode y_BS can shift arbitrarily and the central claim that KDE improves on mean/median cannot be evaluated.
Authors: A Gaussian kernel with bandwidth chosen by Silverman's rule of thumb was employed. We will expand the method section with the exact kernel definition, bandwidth formula, and pseudocode so that the improvement over mean/median aggregation can be reproduced and assessed. revision: yes
-
Referee: [Results] Bagging Score definition and results: beta_BS is asserted to 'reflect the confidence of the obtained ensemble prediction' yet no calibration plot, rank correlation with |y_BS - y_true|, or scatter against ground-truth error on held-out data is supplied, leaving the quality criterion unanchored.
Authors: beta_BS is the normalized KDE density evaluated at the mode y_BS. Experiments show it correlates negatively with absolute error. We will add a scatter plot of beta_BS versus |y_BS - y_true| on held-out data together with the reported Spearman correlation to anchor the confidence interpretation. revision: yes
Circularity Check
No circularity: KDE mode and Bagging Score derived directly from bagged predictions without self-referential reduction
full rationale
The paper defines y_BS via KDE applied to the empirical distribution of bagged NN outputs and introduces beta_BS as a quality criterion extracted from the same density estimate. No equations, definitions, or self-citations are present that make either quantity equivalent to its inputs by construction (e.g., no fitted parameter renamed as prediction, no uniqueness theorem imported from prior author work, no ansatz smuggled via citation). The derivation remains self-contained against the bagging ensemble outputs; external benchmarks or calibration would be needed for correctness but are irrelevant to circularity. This is the expected honest non-finding for a method paper whose central step is a standard nonparametric density estimation applied to an existing set of predictions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Selection of proper neural network sizes an d architectures—A comparative study
Hunter, David, Hao Yu, Michael S Pukish III, Janusz Kolbu sz, and Bogdan M Wil- amowski. 2012. “Selection of proper neural network sizes an d architectures—A comparative study. ” IEEE Transactions on Industrial Infor matics 8 (2): 228–240
work page 2012
-
[2]
An introduction to regression anal ysis
Sykes, Alan O. 1993. “An introduction to regression anal ysis. ”
work page 1993
-
[3]
Breiman, Leo. 1996. “Bagging predictors. ” Machine lear ning 24: 123–140
work page 1996
-
[4]
The boosting approach to mach ine learning: An overview
Schapire, Robert E. 2003. “The boosting approach to mach ine learning: An overview. ” Nonlinear estimation and classification 149–17 1
work page 2003
-
[5]
Boosting and other machine learning algorit hms
Drucker, Harris, Corinna Cortes, Lawrence D Jackel, Yan n LeCun, and Vladimir Vapnik. 1994. “Boosting and other machine learning algorit hms. ” In Machine Learning Proceedings 1994, 53–61. Elsevier
work page 1994
-
[6]
Wang, Ying, Yong Fan, Priyanka Bhatt, and Christos Davat zikos. 2010. “High- dimensional pattern regression using machine learning: fr om medical images to continuous clinical variables. ” Neuroimage 50 (4): 1519–1 535
work page 2010
-
[7]
An empirical compari son of voting classifica- tion algorithms: Bagging, boosting, and variants
Bauer, Eric, and Ron Kohavi. 1999. “An empirical compari son of voting classifica- tion algorithms: Bagging, boosting, and variants. ” Machin e learning 36: 105–139
work page 1999
-
[8]
On bagging and nonlinear estimation
Friedman, Jerome H, and Peter Hall. 2007. “On bagging and nonlinear estimation. ” Journal of statistical planning and inference 137 (3): 669– 683
work page 2007
-
[9]
Bagging for Gaussia n process regression
Chen, Tao, and Jianghong Ren. 2009. “Bagging for Gaussia n process regression. ” Neurocomputing 72 (7-9): 1605–1610
work page 2009
-
[10]
Grandvalet, Yves. 2004. “Bagging equalizes influence. ” Machine Learning 55: 251–270
work page 2004
-
[11]
Guo, Hongwei, Xiaoying Zhuang, Jianfeng Chen, and Hehu a Zhu. 2022. “Predict- ing earthquake-induced soil liquefaction based on machine learning classifiers: A comparative multi-dataset study. ” International Journal of Computational Meth- ods 19 (08): 2142004
work page 2022
-
[12]
Lin, Shan, Zenglong Liang, Shuaixing Zhao, Miao Dong, H ongwei Guo, and Hong Zheng. 2024. “A comprehensive evaluation of ensemble m achine learning in geotechnical stability analysis and explainability. ” I nternational Journal of Me- chanics and Materials in Design 20 (2): 331–352
work page 2024
-
[13]
On estimation of a probability density function and mode
Parzen, Emanuel. 1962. “On estimation of a probability density function and mode. ” The annals of mathematical statistics 33 (3): 1065–1 076
work page 1962
-
[14]
Kernel density estimat ion and its application
Weglarczyk, Stanislaw. 2018. “Kernel density estimat ion and its application. ” In ITM web of conferences, Vol. 23, 00037. EDP Sciences
work page 2018
-
[15]
Alternating Tr ansfer Functions to Prevent Overfitting in Non-Linear Regression with Neural Networks
Seitz, Philipp, and Jan Schmitt. 2023. “Alternating Tr ansfer Functions to Prevent Overfitting in Non-Linear Regression with Neural Networks. ” Journal of Experi- mental & Theoretical Artificial Intelligence 1–22
work page 2023
-
[16]
Modeling of strength of high-performa nce con- crete using artificial neural networks
Yeh, I-C. 1998. “Modeling of strength of high-performa nce con- crete using artificial neural networks. ” Accessed: 2023-09 -13, https://www.kaggle.com/datasets/maajdl/yeh-concret-data
work page 1998
-
[17]
A tutorial on kernel density esti mation and recent ad- vances
Chen, Yen-Chi. 2017. “A tutorial on kernel density esti mation and recent ad- vances. ” Biostatistics & Epidemiology 1 (1): 161–187
work page 2017
-
[18]
Sheather, Simon J. 2004. “Density estimation. ” Statis tical science 588–597
work page 2004
-
[19]
Outlier prediction using random forest classifier
Mohandoss, Divya Pramasani, Yong Shi, and Kun Suo. 2021 . “Outlier prediction using random forest classifier. ” In 2021 IEEE 11th Annual Com puting and Com- munication Workshop and Conference (CCWC), 0027–0033. IEE E. Evaluation of Bagging Predictors with Kernel Density Estimat ion and Bagging Score ICAAI 2025, November 14–16, 2025, Manche ster, United Kingdom
work page 2021
-
[20]
Modeling of strength of high-performance co ncrete using artificial neural networks
YEH, I.-C. Modeling of strength of high-performance co ncrete using artificial neural networks. Cement and Concrete research, 1998, 28. Jg., Nr. 12, S. 1797-1808
work page 1998
-
[21]
CHOU, Jui-Sheng, et al. Optimizing the prediction accu racy of concrete com- pressive strength based on a comparison of data-mining tech niques. Journal of Computing in Civil Engineering, 2011, 25. Jg., Nr. 3, S. 242- 253
work page 2011
-
[22]
ERDAL, Halil Ibrahim; KARAKURT, Onur; NAMLI, Ersin. Hi gh performance concrete compressive strength forecasting using ensemble models based on dis- crete wavelet transform. Engineering Applications of Artificial Intelligence, 2013,
work page 2013
-
[23]
Jg., Nr. 4, S. 1246-1254
-
[24]
CHOU, Jui-Sheng; PHAM, Anh-Duc. Enhanced artificial in telligence for ensem- ble approach to predicting high performance concrete compressive strength. Con- struction and Building Materials, 2013, 49. Jg., S. 554-563
work page 2013
-
[25]
Machine learning in concrete st rength simulations: Multi-nation data analytics
CHOU, Jui-Sheng, et al. Machine learning in concrete st rength simulations: Multi-nation data analytics. Construction and Building ma terials, 2014, 73. Jg., S. 771-780
work page 2014
-
[26]
CHENG, Min-Yuan; FIRDAUSI, Pratama Mahardika; PRAYOG O, Doddy. High- performance concrete compressive strength prediction usi ng Genetic Weighted Pyramid Operation Tree (GWPOT). Engineering Applications of Artificial Intelli- gence, 2014, 29. Jg., S. 104-113
work page 2014
-
[27]
PHAM, Anh-Duc; HOANG, Nhat-Duc; NGUYEN, Quang-Trung. Predicting com- pressive strength of high-performance concrete using meta heuristic-optimized least squares support vector regression. Journal of Comput ing in Civil Engineer- ing, 2016, 30. Jg., Nr. 3, S. 06015002
work page 2016
-
[28]
HAN, Qinghua, et al. A generalized method to predict the compressive strength of high-performance concrete by improved random forest alg orithm. Construc- tion and Building Materials, 2019, 226. Jg., S. 734-742
work page 2019
-
[29]
CHAKRABORTY, Debaditya; A WOLUSI, Ibukun; GUTIERREZ, Lilianna. An ex- plainable machine learning model to predict and elucidate the compressive behav- ior of high-performance concrete. Results in Engineering, 2021, 11. Jg., S. 100245
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.