Estimating a sharp convergence bound for randomized ensembles

Miles E. Lopes

arxiv: 1303.0727 · v3 · pith:CX2ZBOZHnew · submitted 2013-03-04 · 🧮 math.PR · cs.SI· math.ST· stat.ML· stat.TH

Estimating a sharp convergence bound for randomized ensembles

Miles E. Lopes This is my paper

classification 🧮 math.PR cs.SImath.STstat.MLstat.TH

keywords boundrandomizedclassifierserrorpredictionvarianceconvergenceensemble

0 comments

read the original abstract

When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship between prediction error and ensemble size is unknown in practice. In the standard case when classifiers are aggregated by majority vote, the present work offers a way to quantify this convergence in terms of "algorithmic variance," i.e. the variance of prediction error due only to the randomized training algorithm. Specifically, we study a theoretical upper bound on this variance, and show that it is sharp --- in the sense that it is attained by a specific family of randomized classifiers. Next, we address the problem of estimating the unknown value of the bound, which leads to a unique twist on the classical problem of non-parametric density estimation. In particular, we develop an estimator for the bound and show that its MSE matches optimal non-parametric rates under certain conditions. (Concurrent with this work, some closely related results have also been considered in Cannings and Samworth (2017) and Lopes (2019).)

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration
cs.LG 2026-06 conditional novelty 6.0

A triplet-based plateau search algorithm is proposed to adaptively determine a near-minimal number of trees for random forests by monitoring relative OOB score changes across forest size triplets, removing n_trees fro...