pith. sign in

arxiv: 2606.03549 · v2 · pith:EOZ75ITCnew · submitted 2026-06-02 · 💻 cs.LG · math.PR

How Many Trees in a Random Forest? A Revisited Approach with Plateau Search and Optuna Integration

Pith reviewed 2026-06-28 11:08 UTC · model grok-4.3

classification 💻 cs.LG math.PR
keywords random foresthyperparameter optimizationnumber of treesout-of-bag scoreplateau searchTPEensemble size
0
0 comments X

The pith

A triplet-based search finds near-minimal tree counts for random forests by tracking relative OOB changes without fixing a search range.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to address the difficulty of tuning the number of trees in random forest hyperparameter optimization, where scores improve monotonically and standard methods like TPE push estimates to the upper boundary while early stopping risks premature termination due to noise. It proposes removing the number of trees from the direct TPE search space and instead using an integrated triplet-based plateau-search algorithm that monitors relative changes in out-of-bag scores across a triplet of forest sizes. This adaptively identifies a near-minimal sufficient ensemble size, controlled by a tolerance parameter, while accumulating information across optimization trials. A theoretical analysis relates the relative OOB criterion to the gap from the limiting score and provides an asymptotic variance estimate. Experiments on benchmark datasets show the method often selects smaller sizes than common heuristics but larger ones for certain high-dimensional cases.

Core claim

The triplet-based plateau-search algorithm removes the number of trees from the direct TPE search space and adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag score across a triplet of forest sizes and shifting this triplet accordingly, yielding an automated procedure based on a tolerance parameter. The relative OOB-score criterion is related to the gap between the current and limiting scores, with a derived asymptotic variance estimate for the OOB-based absolute relative difference.

What carries the argument

The triplet-based plateau-search algorithm that shifts a triplet of forest sizes based on relative out-of-bag score changes controlled by a tolerance parameter.

If this is right

  • The number of trees is removed from the direct TPE search space while still using information accumulated across HPO trials.
  • The procedure is automated and user-interpretable based on a single tolerance parameter.
  • The selected number of trees differs substantially from the common heuristic, being smaller for most classical benchmark datasets and larger for some high-dimensional bioinformatics datasets.
  • The relative OOB-score criterion relates directly to the gap from the limiting score with a derived asymptotic variance estimate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could extend to other ensemble models where performance improves monotonically with size.
  • It may lower overall computation in hyperparameter optimization by avoiding evaluation of unnecessarily large forests.
  • The tolerance parameter offers a direct way to trade off between ensemble size and expected performance gain.
  • Integration with existing HPO libraries makes the approach immediately usable without new search-space definitions.

Load-bearing premise

Relative changes in the out-of-bag score across a moving triplet of forest sizes provide a reliable signal for detecting the plateau without being unduly affected by score noise.

What would settle it

On a dataset where the true out-of-bag score continues to improve substantially past the size chosen by the algorithm, showing the detected plateau was not yet reached.

Figures

Figures reproduced from arXiv: 2606.03549 by Andrey Dukhovny, Andrey Lange, Vadim Porvatov.

Figure 1
Figure 1. Figure 1: Three scenarios in plateau search for tree count in Random Forest hyperparameter optimization (limiting score S∞ — plateau level): shift right — early stage, significant improvements, stay — transition to plateau, diminishing returns, and shift left — plateau region, minimal improvements. Points L, B, R represent consecutive evaluations with vertical gaps platL and platR . right by multiplying by sf: [L, B… view at source ↗
Figure 2
Figure 2. Figure 2: Examples of random and mean trajectories of the central triplet point B across HPO trials for three datasets, with the mean computed over 20 runs. The background colormap shows the empirical frequency of the corresponding tree count at each trial. Here T0 = 100 and sf = 1.5; in the left panel, ε = 3 × 10−3 , and in the other two panels, ε = 10−3 . the plateau is reached. At the same time, if the grid grew … view at source ↗
Figure 3
Figure 3. Figure 3: Sensitivity of the best achieved OOB performance to the tolerance parameter ε for three representative datasets: a large-n classification task (CreditCardDefault, ROC-AUC), a regression task (Abalone, RMSE), and a high-dimensional classification task (Arcene, ROC-AUC). Boxplots summarize 20 random seeds for each ε; the central line denotes the median and boxes indicate the interquartile range (whiskers fol… view at source ↗
Figure 4
Figure 4. Figure 4: Runtime and selected number of trees T for TPE, HB, ES, and the proposed PLATEAU search across all datasets. Bars show the mean over 20 random seeds and error bars indicate ±1 standard deviation. For TPE and HB, the tree-count search uses the fixed upper bound Tmax = 2565; for ES and PLATEAU, the stopping tolerance is set to ε = εmin. All experiments use tune_criterion=‘‘No’’, only_depth=‘‘No’’, sf = 1.5, … view at source ↗
read the original abstract

Hyperparameter optimization (HPO) for Random Forest faces a specific difficulty in tuning the number of trees: the predictive score typically improves monotonically with ensemble size, so standard methods such as Tree-structured Parzen Estimator (TPE) and Hyperband require a predefined search range and often drive the estimate toward its right boundary. Early-stopping strategies avoid fixing such a range, but can be sensitive to score noise and prone to premature stopping. To address this, we propose an integrated triplet-based plateau-search algorithm that removes the number of trees from the direct TPE search space and still exploits information accumulated across HPO trials. The method adaptively tracks a near-minimal sufficient ensemble size by monitoring relative changes in the out-of-bag (OOB) score across a triplet of forest sizes and shifting this triplet accordingly. This yields an automated and user-interpretable procedure based on a tolerance parameter. We also provide a theoretical analysis: we relate the proposed relative OOB-score criterion to the gap between the current and limiting scores, and derive an asymptotic variance estimate for the corresponding OOB-based absolute relative difference. Experiments show that the selected number of trees can differ substantially from the common heuristic: for most classical benchmark datasets it is smaller, whereas for some high-dimensional bioinformatics datasets, such as Arcene and Dorothea, it is larger. The source code and reproducible experiments are available at https://github.com/lange-am/rf_plateau_hpo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an integrated triplet-based plateau-search algorithm for determining the number of trees (n_trees) during TPE-based hyperparameter optimization of Random Forests. By monitoring relative changes in out-of-bag (OOB) scores across a moving triplet of forest sizes and shifting the window accordingly, the method removes n_trees from the direct TPE search space while adaptively identifying a near-minimal sufficient ensemble size based on a tolerance parameter. It includes a theoretical analysis relating the relative OOB criterion to the gap versus the limiting score plus an asymptotic variance estimate for the absolute relative difference, and reports experiments on classical benchmarks and high-dimensional bioinformatics datasets (e.g., Arcene, Dorothea) where the selected n_trees often differs from common heuristics. Reproducible code is provided.

Significance. If the central claim holds, the work offers a practical, automated alternative to fixed-range or early-stopping strategies for ensemble size in RF HPO, with potential for more efficient models. The reproducible experiments and code are a clear strength. The theoretical connection between the triplet criterion and the score gap is a positive contribution if the derivations are complete and the finite-sample behavior is addressed.

major comments (2)
  1. [§3] §3 (theoretical analysis): the derivation relates the proposed relative OOB criterion directly to the limiting score gap using the method's own definitions and supplies an asymptotic variance for the absolute relative difference, but does not address finite-sample OOB variance (e.g., small n, high-dimensional data, or early stopping within forests). This leaves the reliability of the triplet rule under realistic noise unproven, which is load-bearing for the claim that the procedure is automated and reliable without post-hoc adjustments.
  2. [§4] §4 (algorithm description) and experiments: the triplet rule is presented as detecting the plateau via relative OOB changes with a tolerance parameter, yet no analysis or ablation shows behavior when noise exceeds the tolerance (as flagged by the weakest assumption). If the moving triplet can stop prematurely or continue expanding, the central claim that it yields a near-minimal sufficient size is undermined; a concrete finite-sample bound or simulation under OOB variance would be needed.
minor comments (2)
  1. [§5] The abstract and §5 mention that selected n_trees differs substantially from heuristics on most datasets but is larger on Arcene/Dorothea; adding a table with exact values, tolerance settings, and variance across runs would improve clarity.
  2. Notation for the triplet window and relative difference should be defined once with consistent symbols (e.g., avoid re-using OOB without subscript) to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the reproducibility of the code as well as the potential practical value of the approach. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [§3] §3 (theoretical analysis): the derivation relates the proposed relative OOB criterion directly to the limiting score gap using the method's own definitions and supplies an asymptotic variance for the absolute relative difference, but does not address finite-sample OOB variance (e.g., small n, high-dimensional data, or early stopping within forests). This leaves the reliability of the triplet rule under realistic noise unproven, which is load-bearing for the claim that the procedure is automated and reliable without post-hoc adjustments.

    Authors: We thank the referee for this observation. The analysis in §3 indeed establishes the connection to the limiting score gap and supplies the asymptotic variance of the absolute relative difference, but does not derive finite-sample bounds or explicitly characterize behavior under the OOB variance encountered with small n or high-dimensional data. We agree this is a limitation for the reliability claim. In the revised manuscript we will expand §3 with a dedicated paragraph acknowledging the asymptotic nature of the results and add a short simulation study that injects realistic OOB noise levels to illustrate the triplet rule's sensitivity. revision: yes

  2. Referee: [§4] §4 (algorithm description) and experiments: the triplet rule is presented as detecting the plateau via relative OOB changes with a tolerance parameter, yet no analysis or ablation shows behavior when noise exceeds the tolerance (as flagged by the weakest assumption). If the moving triplet can stop prematurely or continue expanding, the central claim that it yields a near-minimal sufficient size is undermined; a concrete finite-sample bound or simulation under OOB variance would be needed.

    Authors: We agree that the current version contains no controlled ablation or simulation examining the triplet rule when OOB noise exceeds the tolerance. The reported experiments cover a range of datasets but do not isolate this failure mode. We will revise §4 and the experimental section to include an ablation study that varies noise amplitude and tolerance, documenting cases of premature stopping or continued expansion, together with practical guidance on tolerance selection based on the observed OOB variance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper defines a new triplet-based plateau-search procedure that monitors relative OOB-score changes to select n_trees outside the TPE space, then derives an asymptotic variance for its own absolute-relative-difference statistic and relates the criterion to the limiting score gap. These steps follow directly from the algorithm's definitions and standard OOB properties without reducing any claimed result to a fitted input, self-citation chain, or ansatz smuggled from prior work. Experiments on external benchmarks provide independent validation. No load-bearing step collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method depends on one user-chosen tolerance parameter and the domain assumption that OOB scores improve monotonically toward a limit; no new entities are postulated.

free parameters (1)
  • tolerance parameter
    User-set threshold on relative OOB score change that determines when the plateau is reached and the search stops.
axioms (1)
  • domain assumption Out-of-bag scores improve monotonically with increasing ensemble size up to an asymptotic limit.
    This monotonicity underpins the plateau detection logic and the relation to the limiting score gap.

pith-pipeline@v0.9.1-grok · 5799 in / 1222 out tokens · 34637 ms · 2026-06-28T11:08:30.483250+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 8 canonical work pages

  1. [1]

    Breiman, ‘‘Bagging predictors,’’Machine Learning, vol

    L. Breiman, ‘‘Bagging predictors,’’Machine Learning, vol. 24, no. 2, pp. 123–140, 1996

  2. [2]

    ——, ‘‘Random forests,’’Machine Learning, vol. 45, no. 1, pp. 5–32, 2001

  3. [3]

    Breiman and A

    L. Breiman and A. Cutler,Manual—Setting up, using, and understanding random forests V4.0, 2003, [Online]. Available: https://www.stat.berkeley.edu/ breiman/Using_random_forests_v4.0.pdf. Accessed on: Jun. 19, 2026. [Online]. Available: https://www.stat.berkele y.edu/~breiman/Using_random_forests_v4.0.pdf

  4. [4]

    Biau and E

    G. Biau and E. Scornet, ‘‘A random forest guided tour,’’TEST, vol. 25, no. 2, pp. 197–227, Jun. 2016

  5. [5]

    R. E. Schapire, ‘‘A brief introduction to boosting,’’ inProceedings of the Sixteenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1999, pp. 1401–1406. [Online]. Available: https: //www.ijcai.org/Proceedings/99-2/Papers/103.pdf

  6. [6]

    J. H. Friedman, ‘‘Greedy function approximation: A gradient boosting machine,’’The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001

  7. [7]

    ——, ‘‘Stochastic gradient boosting,’’Computational Statistics & Data Analysis, vol. 38, no. 4, pp. 367–378, 2002

  8. [8]

    Borisov, T

    V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, ‘‘Deep neural networks and tabular data: A survey,’’IEEE Transactions on 21 V.A. Porvatov et al.: How Many Trees in a Random Forest? A Revisited Approach Neural Networks and Learning Systems, vol. 35, no. 6, pp. 7499–7519, 2024

  9. [9]

    Grinsztajn, E

    L. Grinsztajn, E. Oyallon, and G. V aroquaux, ‘‘Why do tree-based models still outperform deep learning on typical tabular data?’’ inAdvances in Neural Information Processing Systems, vol. 35. Curran Associates, Inc., 2022, pp. 507–520. [Online]. Available: https://proceedings.neurips.cc/p aper_files/paper/2022/file/0378c7692da36807bdec87ab043cdadc-Paper ...

  10. [10]

    Shwartz-Ziv and A

    R. Shwartz-Ziv and A. Armon, ‘‘Tabular data: Deep learning is not all you need,’’Information Fusion, vol. 81, pp. 84–90, 2022

  11. [11]

    Chen and C

    T. Chen and C. Guestrin, ‘‘Xgboost: A scalable tree boosting system,’’ inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–794

  12. [12]

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Y e, and T.-Y . Liu, ‘‘Lightgbm: A highly efficient gradient boosting decision tree,’’Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/6449 f44a102fde848669bdd9eb6b76fa-Paper.pdf

  13. [13]

    Prokhorenkova, G

    L. Prokhorenkova, G. Gusev, A. V orobev, A. V . Dorogush, and A. Gulin, ‘‘Catboost: unbiased boosting with categorical features,’’ inAdvances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31. Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neu...

  14. [14]

    Fernández-Delgado, E

    M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, ‘‘Do we need hundreds of classifiers to solve real world classification problems?’’ Journal of Machine Learning Research, vol. 15, no. 90, pp. 3133–3181,

  15. [15]

    Available: http://jmlr.org/papers/v15/delgado14a.html

    [Online]. Available: http://jmlr.org/papers/v15/delgado14a.html

  16. [16]

    Kelly, R

    M. Kelly, R. Longjohn, and K. Nottingham, ‘‘UCI machine learning repository,’’ University of California, Irvine, 2023, [Online]. Available: https://archive.ics.uci.edu. Accessed on: Jun. 19, 2026

  17. [17]

    Pedregosa, G

    F. Pedregosa, G. V aroquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P . Prettenhofer, R. Weiss, V . Dubourg, J. V anderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, ‘‘Scikit- learn: Machine learning in Python,’’Journal of Machine Learning Re- search, vol. 12, pp. 2825–2830, 2011

  18. [18]

    Liaw and M

    A. Liaw and M. Wiener, ‘‘Classification and regression by randomforest,’’ R News, vol. 2, no. 3, pp. 18–22, 2002. [Online]. Available: https: //CRAN.R-project.org/doc/Rnews/

  19. [19]

    Hothorn, K

    T. Hothorn, K. Hornik, and A. Zeileis, ‘‘Unbiased recursive partitioning: A conditional inference framework,’’Journal of Computational and Graphi- cal Statistics, vol. 15, no. 3, pp. 651–674, 2006

  20. [20]

    Hothorn and A

    T. Hothorn and A. Zeileis, ‘‘partykit: A modular toolkit for recursive partytioning in r,’’Journal of Machine Learning Research, vol. 16, no. 118, pp. 3905–3909, 2015. [Online]. Available: http://jmlr.org/papers/v1 6/hothorn15a.html

  21. [21]

    M. N. Wright and A. Ziegler, ‘‘ranger: A fast implementation of random forests for high dimensional data in c++ and r,’’Journal of Statistical Software, vol. 77, no. 1, p. 1–17, 2017. [Online]. Available: https://www.jstatsoft.org/index.php/jss/article/view/v077i01

  22. [22]

    Probst, M

    P . Probst, M. Wright, and A.-L. Boulesteix, ‘‘Hyperparameters and tuning strategies for random forest,’’ArXiv preprint arXiv:1804.03515, 2018. [Online]. Available: https://arxiv.org/abs/1804.03515

  23. [23]

    N., and Boulesteix, A.-L

    P . Probst, M. N. Wright, and A.-L. Boulesteix, ‘‘Hyperparameters and tuning strategies for random forest,’’WIREs Data Mining and Knowledge Discovery, vol. 9, no. 3, p. e1301, 2019. [Online]. Available: https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/widm.1301

  24. [24]

    T. M. Lange, M. Gültas, A. O. Schmitt, and F. Heinrich, ‘‘optrf: Optimising random forest stability by determining the optimal number of trees,’’ BMC Bioinformatics, vol. 26, no. 1, p. 95, Mar 2025. [Online]. Available: https://doi.org/10.1186/s12859-025-06097-1

  25. [25]

    Díaz-Uriarte and S

    R. Díaz-Uriarte and S. A. de Andrés, ‘‘Gene selection and classification of microarray data using random forest,’’BMC Bioinformatics, vol. 7, p. 3, 2006

  26. [26]

    B. A. Goldstein, A. E. Hubbard, A. Cutler, and L. F. Barcellos, ‘‘An ap- plication of random forests to a genome-wide association dataset: Method- ological considerations & new findings,’’BMC Genetics, vol. 11, p. 49, 2010

  27. [27]

    B. A. Goldstein, E. C. Polley, and F. B. S. Briggs, ‘‘Random forests for genetic association studies,’’Statistical Applications in Genetics and Molecular Biology, vol. 10, no. 1, p. 32, 2011

  28. [28]

    Janitza, E

    S. Janitza, E. Celik, and A.-L. Boulesteix, ‘‘A computationally fast variable importance test for random forests for high-dimensional data,’’Advances in Data Analysis and Classification, vol. 12, no. 4, pp. 885–915, 2018

  29. [29]

    Breiman, J

    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone,Classification and Regression Trees. Boca Raton: Chapman and Hall/CRC, 1984

  30. [30]

    Louppe, ‘‘Understanding random forests: From theory to practice,’’

    G. Louppe, ‘‘Understanding random forests: From theory to practice,’’

  31. [31]

    Available: https://arxiv.org/abs/1407.7502

    [Online]. Available: https://arxiv.org/abs/1407.7502

  32. [32]

    Guyon, J

    I. Guyon, J. Weston, S. Barnhill, and V . V apnik, ‘‘Gene selection for cancer classification using support vector machines,’’Machine Learning, vol. 46, no. 1-3, pp. 389–422, 2002

  33. [33]

    M. B. Kursa and W. R. Rudnicki, ‘‘Feature selection with the Boruta package,’’Journal of Statistical Software, vol. 36, no. 11, p. 1–13, 2010. [Online]. Available: https://www.jstatsoft.org/index.php/jss/article/view/v 036i11

  34. [34]

    Marbach, J

    D. Marbach, J. C. Costello, R. Küffner, N. M. V ega, R. J. Prill, D. M. Camacho, K. R. Allison, M. Kellis, J. J. Collins, and G. Stolovitzky, ‘‘Wisdom of crowds for robust gene network inference,’’Nature Methods, vol. 9, no. 8, pp. 796–804, Aug. 2012

  35. [35]

    V . A. Huynh-Thu, A. Irrthum, L. Wehenkel, and P . Geurts, ‘‘Inferring regulatory networks from expression data using tree-based methods,’’PLoS One, vol. 5, no. 9, p. e12776, 2010

  36. [36]

    Tolosi and T

    L. Tolosi and T. Lengauer, ‘‘Classification with correlated features: unre- liability of feature ranking and solutions,’’Bioinformatics, vol. 27, no. 14, pp. 1986–1994, Jul. 2011

  37. [37]

    Strobl, A.-L

    C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn, ‘‘Bias in random forest variable importance measures: Illustrations, sources and a solution,’’ BMC Bioinformatics, vol. 8, p. 25, 2007

  38. [38]

    Hooker, L

    G. Hooker, L. Mentch, and S. Zhou, ‘‘Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance,’’Statistics and Computing, vol. 31, no. 6, p. 82, Oct 2021. [Online]. Available: https://doi.org/10.1007/s112 22-021-10057-z

  39. [39]

    S. M. Lundberg and S.-I. Lee, ‘‘A unified approach to interpreting model predictions,’’ inAdvances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017, pp. 4768–4777. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/8a20a8621978 632d76c43dfd28b67767-Paper.pdf

  40. [40]

    Covert, S

    I. Covert, S. M. Lundberg, and S.-I. Lee, ‘‘Understanding global feature contributions with additive importance measures,’’ inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 17 212–17 223. [Online]. Available: https://proceedings.neurips.cc/p...

  41. [42]

    Available: https://arxiv.org/abs/1802.03888

    [Online]. Available: https://arxiv.org/abs/1802.03888

  42. [43]

    S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, ‘‘From local explanations to global understanding with explainable ai for trees,’’Nature Machine Intelligence, vol. 2, no. 1, pp. 56–67, 2020

  43. [44]

    Sundararajan, A

    M. Sundararajan, A. Taly, and Q. Y an, ‘‘Axiomatic attribution for deep networks,’’Proceedings of the 34th International Conference on Machine Learning (ICML), vol. 70, pp. 3319–3328, 2017. [Online]. Available: https://arxiv.org/abs/1703.01365

  44. [45]

    Shrikumar, P

    A. Shrikumar, P . Greenside, and A. Kundaje, ‘‘Learning important features through propagating activation differences,’’ inInternational Conference on Machine Learning (ICML), vol. 70, 2017, pp. 3145–3153. [Online]. Available: https://arxiv.org/abs/1704.02685

  45. [46]

    Adebayo, J

    J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, ‘‘Sanity checks for saliency maps,’’ inAdvances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31, 2018, pp. 9505–9515. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/201 8/h...

  46. [47]

    Ghorbani, A

    A. Ghorbani, A. Abid, and J. Zou, ‘‘Interpretation of neural networks is fragile,’’ inProceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 3681–3688

  47. [48]

    L. Sixt, M. Granz, and T. Landgraf, ‘‘When explanations lie: Why many modified BP attributions fail,’’ inProceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol

  48. [49]

    9046–9057

    PMLR, 13–18 Jul 2020, pp. 9046–9057. [Online]. Available: https://proceedings.mlr.press/v119/sixt20a.html 22 V.A. Porvatov et al.: How Many Trees in a Random Forest? A Revisited Approach With Plateau Search

  49. [50]

    Scornet, ‘‘On the asymptotics of random forests,’’Journal of Multivariate Analysis, vol

    E. Scornet, ‘‘On the asymptotics of random forests,’’Journal of Multivariate Analysis, vol. 146, pp. 72–83, 2016, special Issue on Statistical Models and Methods for High or Infinite Dimensional Spaces. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0 047259X15001542

  50. [51]

    Wager, T

    S. Wager, T. Hastie, and B. Efron, ‘‘Confidence intervals for random forests: The jackknife and the infinitesimal jackknife,’’Journal of Machine Learning Research, vol. 15, no. 1, pp. 1625–1651, 2014. [Online]. Available: https://www.jmlr.org/papers/volume15/wager14a/wa ger14a.pdf

  51. [52]

    Bergstra, R

    J. Bergstra, R. Bardenet, Y . Bengio, and B. Kégl, ‘‘Algorithms for hyper-parameter optimization,’’ inAdvances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P . Bartlett, F. Pereira, and K. Weinberger, Eds., vol. 24. Curran Associates, Inc., 2011, pp. 2546–2554. [Online]. Available: https://proceedings.neurips.cc/paper_fil es/paper/...

  52. [53]

    T. M. Oshiro, P . S. Perez, and J. A. Baranauskas, ‘‘How many trees in a random forest?’’ inMachine Learning and Data Mining in Pattern Recognition. Springer Berlin Heidelberg, 2012, pp. 154–168

  53. [54]

    Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo

    T. Akiba, S. Sano, T. Y anase, T. Ohta, and M. Koyama, ‘‘Optuna: A next-generation hyperparameter optimization framework,’’ inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’19. New Y ork, NY , USA: Association for Computing Machinery, 2019, p. 2623–2631. [Online]. Available: https://doi.org/10.1...

  54. [55]

    Strobl and A

    C. Strobl and A. Zeileis, ‘‘Danger: High power! ? exploring the statistical properties of a test for random forest variable importance,’’ Ludwig-Maximilians-Universität München, Tech. Rep. 17, 2008. [Online]. Available: https://epub.ub.uni-muenchen.de/2111/

  55. [56]

    Genuer, J.-M

    R. Genuer, J.-M. Poggi, and C. Tuleau-Malot, ‘‘V ariable selection using random forests,’’Pattern Recognition Letters, vol. 31, no. 14, pp. 2225– 2236, 2010

  56. [57]

    Genuer, J.-M

    R. Genuer, J.-M. Poggi, and C. Tuleau, ‘‘Random forests: some methodological insights,’’ 2008. [Online]. Available: https://arxiv.org/abs/ 0811.3619

  57. [58]

    Cuzzocrea, S

    A. Cuzzocrea, S. L. Francis, and M. M. Gaber, ‘‘An information-theoretic approach for setting the optimal number of decision trees in random forests,’’ in2013 IEEE International Conference on Systems, Man, and Cybernetics, 2013, pp. 1013–1019

  58. [59]

    Probst and A.-L

    P . Probst and A.-L. Boulesteix, ‘‘To tune or not to tune the number of trees in random forest,’’Journal of Machine Learning Research, vol. 18, no. 181, pp. 1–18, 2018. [Online]. Available: https://www.jmlr.org/papers/vo lume18/17-269/17-269.pdf

  59. [60]

    Latinne, O

    P . Latinne, O. Debeir, and C. Decaestecker, ‘‘Limiting the number of trees in random forests,’’ inMultiple Classifier Systems, J. Kittler and F. Roli, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, pp. 178–187

  60. [61]

    Hernández-Lobato, G

    D. Hernández-Lobato, G. Martínez-Muñoz, and A. Suárez, ‘‘How large should ensembles of classifiers be?’’Pattern Recognition, vol. 46, no. 5, pp. 1323–1336, 2013

  61. [62]

    M. E. Lopes, ‘‘A sharp bound on the computation-accuracy trade- off for majority voting ensembles (version 2),’’ 2016, arXiv preprint arXiv:1303.0727v2

  62. [63]

    ——, ‘‘Estimating the algorithmic variance of randomized ensembles via the bootstrap,’’The Annals of Statistics, vol. 47, no. 2, pp. 1088 – 1112,

  63. [64]

    Available: https://doi.org/10.1214/18-AOS1707

    [Online]. Available: https://doi.org/10.1214/18-AOS1707

  64. [65]

    Arlot and R

    S. Arlot and R. Genuer, ‘‘Analysis of purely random forests bias,’’arXiv preprint arXiv:1407.3939, 2014

  65. [66]

    Demidova and M

    L. Demidova and M. Ivkina, ‘‘Approach to determining the boundaries of the search range for the number of trees in the random forest algorithm,’’ in2020 9th Mediterranean Conference on Embedded Computing (MECO), 2020, pp. 1–4

  66. [67]

    Bernard, L

    S. Bernard, L. Heutte, and S. Adam, ‘‘Influence of hyperparameters on ran- dom forest accuracy,’’ inMultiple Classifier Systems, J. A. Benediktsson, J. Kittler, and F. Roli, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 171–180

  67. [68]

    Scornet, ‘‘Tuning parameters in random forests,’’ESAIM: Procs, vol

    E. Scornet, ‘‘Tuning parameters in random forests,’’ESAIM: Procs, vol. 60, pp. 144–162, 2017. [Online]. Available: https://doi.org/10.1051/ proc/201760144

  68. [69]

    Feurer and F

    M. Feurer and F. Hutter,Hyperparameter Optimization. Cham: Springer International Publishing, 2019, pp. 3–33. [Online]. Available: https: //doi.org/10.1007/978-3-030-05318-5_1

  69. [70]

    Bischl, M

    B. Bischl, M. Binder, M. Lang, T. Pielok, J. Richter, S. Coors, J. Thomas, T. Ullmann, M. Becker, A.-L. Boulesteix, D. Deng, and M. Lindauer, ‘‘Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,’’WIREs Data Mining and Knowledge Discovery, vol. 13, no. 2, p. e1484, 2023. [Online]. Available: https: //wires.onlinelibr...

  70. [71]

    Bartz, T

    E. Bartz, T. Bartz-Beielstein, M. Zaefferer, and O. Mersmann,Hyperpa- rameter tuning for machine and deep learning with R: A practical guide. Springer Nature, 2023

  71. [72]

    Bergstra and Y

    J. Bergstra and Y . Bengio, ‘‘Random search for hyper-parameter optimiza- tion,’’The journal of machine learning research, vol. 13, no. 1, pp. 281– 305, 2012

  72. [73]

    Probst, A.-L

    P . Probst, A.-L. Boulesteix, and B. Bischl, ‘‘Tunability: Importance of hyperparameters of machine learning algorithms,’’Journal of Machine Learning Research, vol. 20, no. 53, pp. 1–32, 2019. [Online]. Available: http://jmlr.org/papers/v20/18-444.html

  73. [74]

    Snoek, H

    J. Snoek, H. Larochelle, and R. P . Adams, ‘‘Practical bayesian optimization of machine learning algorithms,’’ inAdvances in Neural Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and K. Weinberger, Eds., vol. 25. Curran Associates, Inc., 2012, p. 2951–2959. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2012/file/...

  74. [75]

    Hutter, H

    F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘‘Sequential model-based optimization for general algorithm configuration,’’ inLearning and Intel- ligent Optimization, C. A. C. Coello, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 507–523

  75. [76]

    L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, ‘‘Hyperband: A novel bandit-based approach to hyperparameter optimization,’’Journal of Machine Learning Research, vol. 18, no. 185, pp. 1–52, 2018. [Online]. Available: http://jmlr.org/papers/v18/16-558.html

  76. [77]

    Falkner, A

    S. Falkner, A. Klein, and F. Hutter, ‘‘BOHB: Robust and efficient hyperparameter optimization at scale,’’ inProceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 2018, pp. 1437–1446. [Online]. Available: https://proceedings.mlr.press/ v80/falkner18a.html

  77. [78]

    Bergstra, B

    J. Bergstra, B. Komer, C. Eliasmith, D. Y amins, and D. D. Cox, ‘‘Hyperopt: a python library for model selection and hyperparameter optimization,’’ Computational Science and Discovery, vol. 8, no. 1, p. 014008, jul 2015. [Online]. Available: https://doi.org/10.1088/1749-4699/8/1/014008

  78. [79]

    A. W. v. d. V aart,Delta Method, ser. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998, pp. 25– 34

  79. [80]

    [Online]

    Kaggle, ‘‘Kaggle: Y our machine learning and data science community,’’ 2025, accessed: 2025-12-23. [Online]. Available: https://www.kaggle.com

  80. [81]

    Field,Discovering statistics using IBM SPSS statistics

    A. Field,Discovering statistics using IBM SPSS statistics. Sage publica- tions limited, 2024. APPENDIX Proof of Proposition 1.From (8), SB −S ∞ =cB −γ +o(B −γ),(24) and, sinceR=sf·B, we similarly haveS R −S ∞ = csf −γB−γ +o(B −γ). Therefore, SB −S R =c(1−sf −γ)B−γ +o(B −γ).(25) Becausec(1−sf −γ)̸= 0, dividing (24) by (25) yields (9). By reformulating (9) ...

Showing first 80 references.