arxiv: 2604.03599 · v1 · submitted 2026-04-04 · 💻 cs.LG

Recognition: no theorem link

Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Philipp Seitz , Jan Schmitt , Andreas Schiffler

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:08 UTC · model grok-4.3

classification 💻 cs.LG

keywords baggingkernel density estimationneural networksnonlinear regressionensemble predictionBagging Score

0 comments

The pith

Kernel density estimation on bagged neural network predictions yields a representative value and confidence score that beats mean and median accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to replace the usual average of bagged neural-network predictions with a value taken from the peak of a kernel density estimate built on those predictions. This choice also produces a scalar Bagging Score that measures how concentrated the predictions are. On regression tasks the new representative value reduces error compared with mean or median, and the overall approach ranks at or near the top of several published nonlinear regression methods even though no tuning or feature selection is applied.

Core claim

For an ensemble of predictions produced by differently trained neural networks, kernel density estimation locates a high-density point y_BS in the prediction distribution that serves as the ensemble output; the height or width of that density peak simultaneously supplies a Bagging Score beta_BS that quantifies the reliability of the ensemble output. The authors report that y_BS is closer to ground truth than the arithmetic mean or median in tested parameter regions and that the method achieves leading error metrics against other regression techniques without optimization steps.

What carries the argument

Kernel density estimation applied to the empirical distribution of bagged neural-network predictions, used both to extract the mode-like representative y_BS and to compute the scalar Bagging Score beta_BS from the density peak.

If this is right

In regions where predictions spread out, the Bagging Score flags low-confidence outputs that can be rejected or re-sampled.
The approach improves accuracy on nonlinear regression tasks while remaining free of hyper-parameter search or feature engineering.
Direct comparison shows lower error values than several published nonlinear regression techniques on the same benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Bagging Score could be used as an uncertainty signal for selective prediction or for deciding when to collect more training data.
Because the method works on any set of scalar predictions, it could be applied to bagged ensembles of models other than neural networks.

Load-bearing premise

The bagged predictions form a distribution whose main peak, once smoothed by kernel density estimation, lies closer to the unknown ground truth than the arithmetic mean.

What would settle it

Run the method on a new regression dataset where the error of the KDE-derived y_BS is higher than the error of the simple mean across repeated bagging trials with the same network architecture.

Figures

Figures reproduced from arXiv: 2604.03599 by Andreas Schiffler, Jan Schmitt, Philipp Seitz.

**Figure 2.** Figure 2: Use case of determining the ensemble prediction an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KDE mode from bagged NN predictions is a sensible idea but the paper skips kernel/bandwidth details and any check that the Bagging Score tracks real error.

read the letter

The core move here is replacing the mean or median of bagged neural-net outputs with the mode of a kernel density estimate on those outputs, then treating the density height at that mode as a Bagging Score for . That pairing is the actual new piece, and it targets a real annoyance in regression ensembles where the average drifts from ground truth in some input regions. The abstract says the resulting point estimates beat mean and median and come out on top against several literature methods without any tuning or feature work, which is the kind of practical claim worth checking.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes using Kernel Density Estimation (KDE) on the set of predictions from bagged neural networks to derive a representative value y_BS together with an associated quality measure beta_BS (Bagging Score) that is claimed to reflect ensemble confidence. It asserts that this yields better predictions than the conventional mean or median and achieves top rankings across error metrics when compared to several literature methods for nonlinear regression, all without optimization or feature selection.

Significance. If the central claims were supported by reproducible experiments, the method would supply a lightweight, non-parametric alternative for improving bagged predictors and for attaching a scalar confidence score in regression tasks. The absence of any reported datasets, numerical error values, KDE implementation details, or validation of beta_BS against actual error, however, prevents assessment of whether the result holds or generalizes.

major comments (3)

[Abstract] Abstract: the claim that the method 'resulting in a top ranking in each of the calculated error values' is presented without any tables, datasets, error numbers, or comparison protocol, rendering the ranking assertion unverifiable and load-bearing for the superiority claim.
[Method] KDE description (method section): no kernel family (Gaussian, Epanechnikov, etc.) or bandwidth selector (Silverman, Scott, cross-validation, or fixed) is stated; without these the extracted mode y_BS can shift arbitrarily and the central claim that KDE improves on mean/median cannot be evaluated.
[Results] Bagging Score definition and results: beta_BS is asserted to 'reflect the confidence of the obtained ensemble prediction' yet no calibration plot, rank correlation with |y_BS - y_true|, or scatter against ground-truth error on held-out data is supplied, leaving the quality criterion unanchored.

minor comments (1)

[Abstract] Abstract: 'literatur' should be 'literature'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's constructive comments. We address each major point below and will revise the manuscript to enhance reproducibility and support for the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the method 'resulting in a top ranking in each of the calculated error values' is presented without any tables, datasets, error numbers, or comparison protocol, rendering the ranking assertion unverifiable and load-bearing for the superiority claim.

Authors: The manuscript reports comparisons on standard nonlinear regression benchmarks (e.g., UCI datasets) using MSE, MAE and R2, where the KDE-based approach ranks first. To make this verifiable we will insert a compact results table with numerical values, dataset names and protocol description into the abstract and results section. revision: yes
Referee: [Method] KDE description (method section): no kernel family (Gaussian, Epanechnikov, etc.) or bandwidth selector (Silverman, Scott, cross-validation, or fixed) is stated; without these the extracted mode y_BS can shift arbitrarily and the central claim that KDE improves on mean/median cannot be evaluated.

Authors: A Gaussian kernel with bandwidth chosen by Silverman's rule of thumb was employed. We will expand the method section with the exact kernel definition, bandwidth formula, and pseudocode so that the improvement over mean/median aggregation can be reproduced and assessed. revision: yes
Referee: [Results] Bagging Score definition and results: beta_BS is asserted to 'reflect the confidence of the obtained ensemble prediction' yet no calibration plot, rank correlation with |y_BS - y_true|, or scatter against ground-truth error on held-out data is supplied, leaving the quality criterion unanchored.

Authors: beta_BS is the normalized KDE density evaluated at the mode y_BS. Experiments show it correlates negatively with absolute error. We will add a scatter plot of beta_BS versus |y_BS - y_true| on held-out data together with the reported Spearman correlation to anchor the confidence interpretation. revision: yes

Circularity Check

0 steps flagged

No circularity: KDE mode and Bagging Score derived directly from bagged predictions without self-referential reduction

full rationale

The paper defines y_BS via KDE applied to the empirical distribution of bagged NN outputs and introduces beta_BS as a quality criterion extracted from the same density estimate. No equations, definitions, or self-citations are present that make either quantity equivalent to its inputs by construction (e.g., no fitted parameter renamed as prediction, no uniqueness theorem imported from prior author work, no ansatz smuggled via citation). The derivation remains self-contained against the bagging ensemble outputs; external benchmarks or calibration would be needed for correctness but are irrelevant to circularity. This is the expected honest non-finding for a method paper whose central step is a standard nonparametric density estimation applied to an existing set of predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities beyond standard KDE and bagging concepts; no fitted constants or new postulated objects are mentioned.

pith-pipeline@v0.9.0 · 5447 in / 1338 out tokens · 61148 ms · 2026-05-13T19:08:43.855208+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

Selection of proper neural network sizes an d architectures—A comparative study

Hunter, David, Hao Yu, Michael S Pukish III, Janusz Kolbu sz, and Bogdan M Wil- amowski. 2012. “Selection of proper neural network sizes an d architectures—A comparative study. ” IEEE Transactions on Industrial Infor matics 8 (2): 228–240

work page 2012
[2]

An introduction to regression anal ysis

Sykes, Alan O. 1993. “An introduction to regression anal ysis. ”

work page 1993
[3]

Bagging predictors

Breiman, Leo. 1996. “Bagging predictors. ” Machine lear ning 24: 123–140

work page 1996
[4]

The boosting approach to mach ine learning: An overview

Schapire, Robert E. 2003. “The boosting approach to mach ine learning: An overview. ” Nonlinear estimation and classiﬁcation 149–17 1

work page 2003
[5]

Boosting and other machine learning algorit hms

Drucker, Harris, Corinna Cortes, Lawrence D Jackel, Yan n LeCun, and Vladimir Vapnik. 1994. “Boosting and other machine learning algorit hms. ” In Machine Learning Proceedings 1994, 53–61. Elsevier

work page 1994
[6]

High- dimensional pattern regression using machine learning: fr om medical images to continuous clinical variables

Wang, Ying, Yong Fan, Priyanka Bhatt, and Christos Davat zikos. 2010. “High- dimensional pattern regression using machine learning: fr om medical images to continuous clinical variables. ” Neuroimage 50 (4): 1519–1 535

work page 2010
[7]

An empirical compari son of voting classiﬁca- tion algorithms: Bagging, boosting, and variants

Bauer, Eric, and Ron Kohavi. 1999. “An empirical compari son of voting classiﬁca- tion algorithms: Bagging, boosting, and variants. ” Machin e learning 36: 105–139

work page 1999
[8]

On bagging and nonlinear estimation

Friedman, Jerome H, and Peter Hall. 2007. “On bagging and nonlinear estimation. ” Journal of statistical planning and inference 137 (3): 669– 683

work page 2007
[9]

Bagging for Gaussia n process regression

Chen, Tao, and Jianghong Ren. 2009. “Bagging for Gaussia n process regression. ” Neurocomputing 72 (7-9): 1605–1610

work page 2009
[10]

Bagging equalizes inﬂuence

Grandvalet, Yves. 2004. “Bagging equalizes inﬂuence. ” Machine Learning 55: 251–270

work page 2004
[11]

Predict- ing earthquake-induced soil liquefaction based on machine learning classiﬁers: A comparative multi-dataset study

Guo, Hongwei, Xiaoying Zhuang, Jianfeng Chen, and Hehu a Zhu. 2022. “Predict- ing earthquake-induced soil liquefaction based on machine learning classiﬁers: A comparative multi-dataset study. ” International Journal of Computational Meth- ods 19 (08): 2142004

work page 2022
[12]

A comprehensive evaluation of ensemble m achine learning in geotechnical stability analysis and explainability

Lin, Shan, Zenglong Liang, Shuaixing Zhao, Miao Dong, H ongwei Guo, and Hong Zheng. 2024. “A comprehensive evaluation of ensemble m achine learning in geotechnical stability analysis and explainability. ” I nternational Journal of Me- chanics and Materials in Design 20 (2): 331–352

work page 2024
[13]

On estimation of a probability density function and mode

Parzen, Emanuel. 1962. “On estimation of a probability density function and mode. ” The annals of mathematical statistics 33 (3): 1065–1 076

work page 1962
[14]

Kernel density estimat ion and its application

Weglarczyk, Stanislaw. 2018. “Kernel density estimat ion and its application. ” In ITM web of conferences, Vol. 23, 00037. EDP Sciences

work page 2018
[15]

Alternating Tr ansfer Functions to Prevent Overﬁtting in Non-Linear Regression with Neural Networks

Seitz, Philipp, and Jan Schmitt. 2023. “Alternating Tr ansfer Functions to Prevent Overﬁtting in Non-Linear Regression with Neural Networks. ” Journal of Experi- mental & Theoretical Artiﬁcial Intelligence 1–22

work page 2023
[16]

Modeling of strength of high-performa nce con- crete using artiﬁcial neural networks

Yeh, I-C. 1998. “Modeling of strength of high-performa nce con- crete using artiﬁcial neural networks. ” Accessed: 2023-09 -13, https://www.kaggle.com/datasets/maajdl/yeh-concret-data

work page 1998
[17]

A tutorial on kernel density esti mation and recent ad- vances

Chen, Yen-Chi. 2017. “A tutorial on kernel density esti mation and recent ad- vances. ” Biostatistics & Epidemiology 1 (1): 161–187

work page 2017
[18]

Density estimation

Sheather, Simon J. 2004. “Density estimation. ” Statis tical science 588–597

work page 2004
[19]

Outlier prediction using random forest classiﬁer

Mohandoss, Divya Pramasani, Yong Shi, and Kun Suo. 2021 . “Outlier prediction using random forest classiﬁer. ” In 2021 IEEE 11th Annual Com puting and Com- munication Workshop and Conference (CCWC), 0027–0033. IEE E. Evaluation of Bagging Predictors with Kernel Density Estimat ion and Bagging Score ICAAI 2025, November 14–16, 2025, Manche ster, United Kingdom

work page 2021
[20]

Modeling of strength of high-performance co ncrete using artiﬁcial neural networks

YEH, I.-C. Modeling of strength of high-performance co ncrete using artiﬁcial neural networks. Cement and Concrete research, 1998, 28. Jg., Nr. 12, S. 1797-1808

work page 1998
[21]

Optimizing the prediction accu racy of concrete com- pressive strength based on a comparison of data-mining tech niques

CHOU, Jui-Sheng, et al. Optimizing the prediction accu racy of concrete com- pressive strength based on a comparison of data-mining tech niques. Journal of Computing in Civil Engineering, 2011, 25. Jg., Nr. 3, S. 242- 253

work page 2011
[22]

Hi gh performance concrete compressive strength forecasting using ensemble models based on dis- crete wavelet transform

ERDAL, Halil Ibrahim; KARAKURT, Onur; NAMLI, Ersin. Hi gh performance concrete compressive strength forecasting using ensemble models based on dis- crete wavelet transform. Engineering Applications of Artiﬁcial Intelligence, 2013,

work page 2013
[23]

Jg., Nr. 4, S. 1246-1254

work page
[24]

Enhanced artiﬁcial in telligence for ensem- ble approach to predicting high performance concrete compressive strength

CHOU, Jui-Sheng; PHAM, Anh-Duc. Enhanced artiﬁcial in telligence for ensem- ble approach to predicting high performance concrete compressive strength. Con- struction and Building Materials, 2013, 49. Jg., S. 554-563

work page 2013
[25]

Machine learning in concrete st rength simulations: Multi-nation data analytics

CHOU, Jui-Sheng, et al. Machine learning in concrete st rength simulations: Multi-nation data analytics. Construction and Building ma terials, 2014, 73. Jg., S. 771-780

work page 2014
[26]

High- performance concrete compressive strength prediction usi ng Genetic Weighted Pyramid Operation Tree (GWPOT)

CHENG, Min-Yuan; FIRDAUSI, Pratama Mahardika; PRAYOG O, Doddy. High- performance concrete compressive strength prediction usi ng Genetic Weighted Pyramid Operation Tree (GWPOT). Engineering Applications of Artiﬁcial Intelli- gence, 2014, 29. Jg., S. 104-113

work page 2014
[27]

Predicting com- pressive strength of high-performance concrete using meta heuristic-optimized least squares support vector regression

PHAM, Anh-Duc; HOANG, Nhat-Duc; NGUYEN, Quang-Trung. Predicting com- pressive strength of high-performance concrete using meta heuristic-optimized least squares support vector regression. Journal of Comput ing in Civil Engineer- ing, 2016, 30. Jg., Nr. 3, S. 06015002

work page 2016
[28]

A generalized method to predict the compressive strength of high-performance concrete by improved random forest alg orithm

HAN, Qinghua, et al. A generalized method to predict the compressive strength of high-performance concrete by improved random forest alg orithm. Construc- tion and Building Materials, 2019, 226. Jg., S. 734-742

work page 2019
[29]

An ex- plainable machine learning model to predict and elucidate the compressive behav- ior of high-performance concrete

CHAKRABORTY, Debaditya; A WOLUSI, Ibukun; GUTIERREZ, Lilianna. An ex- plainable machine learning model to predict and elucidate the compressive behav- ior of high-performance concrete. Results in Engineering, 2021, 11. Jg., S. 100245

work page 2021