A Stationary-Distribution Theory for Triplet-Based Plateau Search in Random Forest Ensemble-Size Selection

Andrey A. Dukhovny; Andrey M. Lange

arxiv: 2606.30837 · v1 · pith:Y2JNZNV5new · submitted 2026-06-29 · 💻 cs.LG · cs.AI· math.PR· stat.ML

A Stationary-Distribution Theory for Triplet-Based Plateau Search in Random Forest Ensemble-Size Selection

Andrey A. Dukhovny , Andrey M. Lange This is my paper

Pith reviewed 2026-07-01 06:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.PRstat.ML

keywords random forestensemble sizeplateau searchstationary distributionbirth-death chaingeometric gridout-of-bag scoreMarkov chain

0 comments

The pith

Triplet plateau search for random forest ensemble size reaches a stationary distribution centered at scale O(ε^{-2}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models the process of selecting the number of trees in a random forest using plateau-based tuning with geometric triplets of out-of-bag scores as a stochastic process rather than a deterministic stop. It treats the central ensemble size as a birth-death Markov chain on a geometric grid and derives its stationary distribution through local balance equations. Under a leading centered folded-normal approximation, the center of this distribution grows like one over epsilon squared as the tolerance epsilon goes to zero. The spread scales such that the relative variability remains independent of epsilon and is controlled by the scale factor and update rule. This frames the tuning as reaching an equilibrium regime whose properties can be calculated explicitly.

Core claim

The central ensemble size B_t is modeled as a birth-death Markov chain on a geometric grid, and its stationary distribution is derived through local balance. Under a leading centered folded-normal approximation, equilibrium equations are obtained for the original update rule and a symmetric modified variant, implying that the stationary center B_* = O(ε^{-2}) as ε ↓ 0. The stationary spread is also characterized: a local Gaussian approximation and a Fokker-Planck interpretation give grid-level variance constants. After conversion to the ensemble-size scale, σ_{B,*} = O(ε^{-2}), while the variance is O(ε^{-4}). The leading relative spread is independent of ε and controlled by the scale factor

What carries the argument

Birth-death Markov chain on a geometric grid whose local balance yields the stationary distribution under a leading centered folded-normal approximation

If this is right

The selected ensemble size grows proportionally to 1/ε² as the accuracy tolerance ε decreases.
The variance of the selected size grows as 1/ε⁴, so relative fluctuations remain constant regardless of how small ε becomes.
A symmetric variant of the update rule yields different equilibrium equations but preserves the same leading scaling.
The continuous approximation via Fokker-Planck equation supplies explicit constants for the grid-level variance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of plateau search rules could tune the scale factor to reduce the constant relative spread without changing the ε scaling.
The same birth-death modeling approach might extend directly to other local-comparison hyperparameter searches that use geometric steps.
For very small ε the theory predicts that even after the hyperparameters stabilize, the chosen tree count will still wander over a wide range whose width is a fixed fraction of the center.
Implementations could monitor the empirical distribution of B_t over many runs to estimate the update-rule-specific spread constant in practice.

Load-bearing premise

The central ensemble size can be modeled as a birth-death Markov chain on a geometric grid whose local balance yields the stationary distribution under a leading centered folded-normal approximation.

What would settle it

Running the triplet plateau search many times on a fixed dataset and fixed ε, then checking whether the long-run histogram of chosen ensemble sizes centers near a value proportional to 1/ε² with relative spread independent of ε.

Figures

Figures reproduced from arXiv: 2606.30837 by Andrey A. Dukhovny, Andrey M. Lange.

**Figure 2.** Figure 2: Triplet update and schematic signed-gap distributions; shaded regions represent [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Grid-level random walk and schematic stationary masses [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Relative inflation of the equilibrium ensemble size induced by the original mixed [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

read the original abstract

The number of trees is a central computational parameter in Random Forests: increasing it reduces finite-ensemble variability but increases training and prediction cost. Plateau-based tuning adapts this parameter through local comparisons of out-of-bag scores at a geometric triplet of tree counts. After the remaining hyperparameters have stabilized, however, the central triplet point need not converge to a deterministic value; instead, it fluctuates around a stationary regime. This paper develops a stationary-distribution theory for this process. The central ensemble size $B_t$ is modeled as a birth-death Markov chain on a geometric grid, and its stationary distribution is derived through local balance. Under a leading centered folded-normal approximation, equilibrium equations are obtained for the original update rule and a symmetric modified variant, implying that the stationary center $B_*=O(\varepsilon^{-2})$ as $\varepsilon\downarrow 0$. The stationary spread is also characterized. A local Gaussian approximation and a Fokker-Planck interpretation give grid-level variance constants. After conversion to the ensemble-size scale, $\sigma_{B,*}=O(\varepsilon^{-2})$, while the variance is $O(\varepsilon^{-4})$. The leading relative spread is independent of $\varepsilon$ and controlled by the scale factor and update rule. These results interpret plateau-based Random Forest tuning as a stochastic process rather than a deterministic stopping rule.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper models triplet plateau search for random forest tree count as a birth-death chain and extracts O(ε^{-2}) scaling via folded-normal local balance, but the approximation lacks error control so the scaling is not yet rigorous.

read the letter

The core contribution is a Markov-chain treatment of the plateau-based tuning rule for the number of trees B in random forests. They treat the central point of the geometric triplet as a birth-death process on the grid {B(1+ε)^k}, write local-balance equations for the stationary distribution, and close them with a centered folded-normal ansatz to conclude that the stationary center sits at B_* = O(ε^{-2}) as ε ↓ 0. They also give a local Gaussian/Fokker-Planck view of the spread, yielding σ_B = O(ε^{-2}) and variance O(ε^{-4}) with relative spread independent of ε.

That framing is new for this specific heuristic; most prior work on random-forest tuning either treats the search as deterministic or reports empirical behavior without a stationary-distribution analysis. The geometric-grid birth-death setup matches the update rule they describe, and the scaling claims follow directly once the ansatz is accepted.

The soft spot is exactly the one flagged in the stress test. The folded-normal step is invoked to close the equilibrium equations and produce the leading-order location, yet the abstract supplies no error bound on the approximation as ε → 0, no comparison of the resulting mass to the exact detailed-balance solution, and no numerical check against the actual transition probabilities. An O(ε) shift in the mode would change the claimed scaling, so the O(ε^{-2}) result remains conditional on the quality of that ansatz. The circularity burden is moderate: ε is an external scale, but the stationary quantities are derived rather than fitted.

This is for people who want a stochastic-process view of hyperparameter search in ensembles rather than a practical tuning recipe. A reader already working on Markov models of optimization or on theoretical analysis of random-forest variance would find the setup familiar and the scaling relations potentially useful, provided the approximation can be justified.

I would send it to peer review. The modeling choice is coherent and the target scaling is concrete; the main task for referees is to check whether the folded-normal closure is accurate enough for the asymptotics or whether a different closure or exact solution is needed.

Referee Report

2 major / 1 minor

Summary. The paper develops a stationary-distribution theory for triplet-based plateau search in random forest ensemble-size selection. It models the central ensemble size B_t as a birth-death Markov chain on a geometric grid, derives its stationary distribution through local balance under a leading centered folded-normal approximation, and obtains equilibrium equations implying the stationary center B_* = O(ε^{-2}) as ε ↓ 0. It further characterizes the stationary spread via local Gaussian and Fokker-Planck approximations, yielding σ_{B,*} = O(ε^{-2}) and variance O(ε^{-4}), with the leading relative spread independent of ε and controlled by the scale factor and update rule.

Significance. If the local-balance derivation and folded-normal approximation hold with controlled error as ε ↓ 0, the results would provide a rigorous stochastic-process interpretation of plateau-based ensemble-size tuning, replacing deterministic stopping rules with explicit scaling laws for center and spread; this could guide practical hyperparameter selection in random forests by quantifying expected fluctuations.

major comments (2)

[Abstract, paragraph 2] Abstract, paragraph 2: the leading centered folded-normal approximation is used to close the local-balance equations and extract B_* ∼ c ε^{-2}, but no error bounds or asymptotic analysis of the approximation error are supplied; an O(ε) bias in the mode location would change the claimed scaling, making this load-bearing for the central O(ε^{-2}) result.
[Abstract] Abstract: equilibrium equations are stated to be obtained for both the original update rule and a symmetric variant, yet the manuscript provides neither the explicit transition probabilities p_{k,k+1}(B) nor a validation that the folded-normal ansatz reproduces the exact detailed-balance stationary mass to the required order.

minor comments (1)

The geometric grid {B(1+ε)^k} and the precise definition of the scale factor ε should be introduced with an explicit equation before the Markov-chain construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. The points raised concern the rigor of the approximation and the explicitness of the derivations. We address each major comment below and commit to revisions that strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract, paragraph 2] Abstract, paragraph 2: the leading centered folded-normal approximation is used to close the local-balance equations and extract B_* ∼ c ε^{-2}, but no error bounds or asymptotic analysis of the approximation error are supplied; an O(ε) bias in the mode location would change the claimed scaling, making this load-bearing for the central O(ε^{-2}) result.

Authors: We agree that explicit error bounds would increase rigor. The centered folded-normal arises as the leading local limit of the birth-death chain near equilibrium under the model assumptions; the scaling B_* = O(ε^{-2}) follows from the mode location at this order. An O(ε) bias would affect only sub-leading corrections. To address the concern directly, the revised manuscript will include an asymptotic analysis of the approximation error confirming that any bias is o(ε) and does not change the leading scaling. revision: yes
Referee: [Abstract] Abstract: equilibrium equations are stated to be obtained for both the original update rule and a symmetric variant, yet the manuscript provides neither the explicit transition probabilities p_{k,k+1}(B) nor a validation that the folded-normal ansatz reproduces the exact detailed-balance stationary mass to the required order.

Authors: The transition probabilities p_{k,k+1}(B) are defined explicitly from the triplet comparison rule in Section 2 of the manuscript. The local-balance equations are closed with the folded-normal ansatz to obtain the equilibrium scaling for both update rules. We will add the explicit expressions for p_{k,k+1}(B) together with a numerical validation (or supporting lemma) showing that the ansatz reproduces the stationary mass to the order needed for the leading asymptotics. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation is self-contained under explicit modeling assumptions

full rationale

The paper models B_t explicitly as a birth-death chain on the geometric grid and derives the stationary distribution via local balance equations closed by a stated leading centered folded-normal approximation. This produces the claimed O(ε^{-2}) scaling as a direct consequence of those assumptions rather than by re-using fitted data, self-citations, or renaming prior results. No load-bearing step reduces to its own inputs by construction, and the abstract and derivation chain contain no self-citation chains or ansatzes imported from the authors' prior work. The result is therefore a standard modeling exercise whose validity rests on the accuracy of the approximation, not on circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract; the model treats the tuning process as a birth-death chain whose stationary properties are derived via approximation. ε is an external scale parameter whose origin is not detailed.

free parameters (1)

ε
Small positive parameter appearing in the triplet search precision; the stationary center and spread scale as O(ε^{-2}).

axioms (2)

domain assumption Central ensemble size B_t is a birth-death Markov chain on a geometric grid
Modeling choice stated in abstract paragraph 2.
ad hoc to paper Leading centered folded-normal approximation suffices for equilibrium equations
Used to obtain the stationary distribution and scaling.

pith-pipeline@v0.9.1-grok · 5781 in / 1385 out tokens · 34304 ms · 2026-07-01T06:25:32.461301+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Simon Bernard, Laurent Heutte, and S´ ebastien Adam

URLhttps://proceedings.neurips.cc/paper_files/ paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf. Simon Bernard, Laurent Heutte, and S´ ebastien Adam. Influence of hyperparameters on random forest accuracy. In J´ on Atli Benediktsson, Josef Kittler, and Fabio Roli, editors, Multiple Classifier Systems, pages 171–180, Berlin, Heidelberg,

2011
[2]

Adversarial examples: Attacks and defenses for deep learning

doi: 10.1109/TNNLS. 2022.3229161. Leo Breiman. Random forests.Machine Learning, 45(1):5–32,

work page doi:10.1109/tnnls 2022
[3]

Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes

doi: 10.1023/A: 1010933404324. Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. Ensemble selection from libraries of models. InProceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 18, New York, NY, USA,

work page doi:10.1023/a:
[4]

ISBN 1581138385

Association for Computing Machinery. ISBN 1581138385. doi: 10.1145/1015330.1015432. URLhttps: //doi.org/10.1145/1015330.1015432. Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM,

work page doi:10.1145/1015330.1015432
[5]

Xgboost: A scalable tree boosting system,

doi: 10.1145/2939672.2939785. 31 Dukhovny and Lange Alfredo Cuzzocrea, Shane Leo Francis, and Mohamed Medhat Gaber. An information- theoretic approach for setting the optimal number of decision trees in random forests. In2013 IEEE International Conference on Systems, Man, and Cybernetics, pages 1013– 1019,

work page doi:10.1145/2939672.2939785
[6]

Liliya Demidova and Maria Ivkina

doi: 10.1109/SMC.2013.177. Liliya Demidova and Maria Ivkina. Approach to determining the boundaries of the search range for the number of trees in the random forest algorithm. In2020 9th Mediterranean Conference on Embedded Computing (MECO), pages 1–4,

work page doi:10.1109/smc.2013.177 2013
[7]

Fiona Katharina Ewald, Ludwig Bothmann, Marvin N

doi: 10.1109/MECO49872.2020.9134342. Fiona Katharina Ewald, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar K¨ onig. A guide to feature importance methods for scientific inference. In Luca Longo, Sebastian Lapuschkin, and Christin Seifert, editors,Explainable Artificial Intelligence, pages 440–464, Cham,

work page doi:10.1109/meco49872.2020.9134342 2020
[8]

doi: 10.1007/978-3-031-63797-1

Springer Nature Switzerland. doi: 10.1007/978-3-031-63797-1

work page doi:10.1007/978-3-031-63797-1
[9]

Robin Genuer, Jean-Michel Poggi, and Christine Tuleau

doi: 10.1214/aos/1013203451. Robin Genuer, Jean-Michel Poggi, and Christine Tuleau. Random forests: some method- ological insights,

work page doi:10.1214/aos/1013203451
[10]

Random Forests: some methodological insights

URLhttps://arxiv.org/abs/0811.3619. L´ eo Grinsztajn, Edouard Oyallon, and Ga¨ el Varoquaux. Why do tree-based mod- els still outperform deep learning on typical tabular data? InAdvances in Neu- ral Information Processing Systems, volume 35, pages 507–520. Curran Associates, Inc.,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Daniel Hern´ andez-Lobato, Gonzalo Mart´ ınez-Mu˜ noz, and Alberto Su´ arez

URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 0378c7692da36807bdec87ab043cdadc-Paper-Datasets_and_Benchmarks.pdf. Daniel Hern´ andez-Lobato, Gonzalo Mart´ ınez-Mu˜ noz, and Alberto Su´ arez. How large should ensembles of classifiers be?Pattern Recognition, 46(5):1323–1336,

2022
[12]

doi: 10.1186/s12859-025-06097-1

ISSN 1471-2105. doi: 10.1186/s12859-025-06097-1. URL https://doi.org/10.1186/s12859-025-06097-1. Patrice Latinne, Olivier Debeir, and Christine Decaestecker. Limiting the number of trees in random forests. In Josef Kittler and Fabio Roli, editors,Multiple Classifier Systems, pages 178–187, Berlin, Heidelberg,

work page doi:10.1186/s12859-025-06097-1
[13]

URLhttps://doi.org/10.1214/18-BA1110

doi: 10.1214/18-BA1110. URLhttps://doi.org/10.1214/18-BA1110. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research, 18(185):1–52,

work page doi:10.1214/18-ba1110
[14]

URLhttps://doi.org/10.1214/18-AOS1707

doi: 10.1214/18-AOS1707. URLhttps://doi.org/10.1214/18-AOS1707. Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding vari- able importances in forests of randomized trees. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors,Advances in Neural Information Process- ing Systems, volume

work page doi:10.1214/18-aos1707
[15]

doi: 10.1007/978-3-642-31537-4

work page doi:10.1007/978-3-642-31537-4
[16]

URLhttps: //ieeexplore.ieee.org/document/11571780

doi: 10.1109/ACCESS.2026.3705574. URLhttps: //ieeexplore.ieee.org/document/11571780. Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hy- perparameters of machine learning algorithms.Journal of Machine Learning Research, 20(53):1–32,

work page doi:10.1109/access.2026.3705574 2026
[17]

Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn

doi: 10.1016/j.inffus.2021.11.011. Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in ran- dom forest variable importance measures: Illustrations, sources and a solution.BMC Bioinformatics, 8:25,

work page doi:10.1016/j.inffus.2021.11.011 2021
[18]

33 Dukhovny and Lange Laura Tolosi and Thomas Lengauer

doi: 10.1186/1471-2105-8-25. 33 Dukhovny and Lange Laura Tolosi and Thomas Lengauer. Classification with correlated features: unreliability of feature ranking and solutions.Bioinformatics, 27 14:1986–94,

work page doi:10.1186/1471-2105-8-25 1986

[1] [1]

Simon Bernard, Laurent Heutte, and S´ ebastien Adam

URLhttps://proceedings.neurips.cc/paper_files/ paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf. Simon Bernard, Laurent Heutte, and S´ ebastien Adam. Influence of hyperparameters on random forest accuracy. In J´ on Atli Benediktsson, Josef Kittler, and Fabio Roli, editors, Multiple Classifier Systems, pages 171–180, Berlin, Heidelberg,

2011

[2] [2]

Adversarial examples: Attacks and defenses for deep learning

doi: 10.1109/TNNLS. 2022.3229161. Leo Breiman. Random forests.Machine Learning, 45(1):5–32,

work page doi:10.1109/tnnls 2022

[3] [3]

Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes

doi: 10.1023/A: 1010933404324. Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. Ensemble selection from libraries of models. InProceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 18, New York, NY, USA,

work page doi:10.1023/a:

[4] [4]

ISBN 1581138385

Association for Computing Machinery. ISBN 1581138385. doi: 10.1145/1015330.1015432. URLhttps: //doi.org/10.1145/1015330.1015432. Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM,

work page doi:10.1145/1015330.1015432

[5] [5]

Xgboost: A scalable tree boosting system,

doi: 10.1145/2939672.2939785. 31 Dukhovny and Lange Alfredo Cuzzocrea, Shane Leo Francis, and Mohamed Medhat Gaber. An information- theoretic approach for setting the optimal number of decision trees in random forests. In2013 IEEE International Conference on Systems, Man, and Cybernetics, pages 1013– 1019,

work page doi:10.1145/2939672.2939785

[6] [6]

Liliya Demidova and Maria Ivkina

doi: 10.1109/SMC.2013.177. Liliya Demidova and Maria Ivkina. Approach to determining the boundaries of the search range for the number of trees in the random forest algorithm. In2020 9th Mediterranean Conference on Embedded Computing (MECO), pages 1–4,

work page doi:10.1109/smc.2013.177 2013

[7] [7]

Fiona Katharina Ewald, Ludwig Bothmann, Marvin N

doi: 10.1109/MECO49872.2020.9134342. Fiona Katharina Ewald, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar K¨ onig. A guide to feature importance methods for scientific inference. In Luca Longo, Sebastian Lapuschkin, and Christin Seifert, editors,Explainable Artificial Intelligence, pages 440–464, Cham,

work page doi:10.1109/meco49872.2020.9134342 2020

[8] [8]

doi: 10.1007/978-3-031-63797-1

Springer Nature Switzerland. doi: 10.1007/978-3-031-63797-1

work page doi:10.1007/978-3-031-63797-1

[9] [9]

Robin Genuer, Jean-Michel Poggi, and Christine Tuleau

doi: 10.1214/aos/1013203451. Robin Genuer, Jean-Michel Poggi, and Christine Tuleau. Random forests: some method- ological insights,

work page doi:10.1214/aos/1013203451

[10] [10]

Random Forests: some methodological insights

URLhttps://arxiv.org/abs/0811.3619. L´ eo Grinsztajn, Edouard Oyallon, and Ga¨ el Varoquaux. Why do tree-based mod- els still outperform deep learning on typical tabular data? InAdvances in Neu- ral Information Processing Systems, volume 35, pages 507–520. Curran Associates, Inc.,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Daniel Hern´ andez-Lobato, Gonzalo Mart´ ınez-Mu˜ noz, and Alberto Su´ arez

URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 0378c7692da36807bdec87ab043cdadc-Paper-Datasets_and_Benchmarks.pdf. Daniel Hern´ andez-Lobato, Gonzalo Mart´ ınez-Mu˜ noz, and Alberto Su´ arez. How large should ensembles of classifiers be?Pattern Recognition, 46(5):1323–1336,

2022

[12] [12]

doi: 10.1186/s12859-025-06097-1

ISSN 1471-2105. doi: 10.1186/s12859-025-06097-1. URL https://doi.org/10.1186/s12859-025-06097-1. Patrice Latinne, Olivier Debeir, and Christine Decaestecker. Limiting the number of trees in random forests. In Josef Kittler and Fabio Roli, editors,Multiple Classifier Systems, pages 178–187, Berlin, Heidelberg,

work page doi:10.1186/s12859-025-06097-1

[13] [13]

URLhttps://doi.org/10.1214/18-BA1110

doi: 10.1214/18-BA1110. URLhttps://doi.org/10.1214/18-BA1110. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research, 18(185):1–52,

work page doi:10.1214/18-ba1110

[14] [14]

URLhttps://doi.org/10.1214/18-AOS1707

doi: 10.1214/18-AOS1707. URLhttps://doi.org/10.1214/18-AOS1707. Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding vari- able importances in forests of randomized trees. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors,Advances in Neural Information Process- ing Systems, volume

work page doi:10.1214/18-aos1707

[15] [15]

doi: 10.1007/978-3-642-31537-4

work page doi:10.1007/978-3-642-31537-4

[16] [16]

URLhttps: //ieeexplore.ieee.org/document/11571780

doi: 10.1109/ACCESS.2026.3705574. URLhttps: //ieeexplore.ieee.org/document/11571780. Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hy- perparameters of machine learning algorithms.Journal of Machine Learning Research, 20(53):1–32,

work page doi:10.1109/access.2026.3705574 2026

[17] [17]

Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn

doi: 10.1016/j.inffus.2021.11.011. Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in ran- dom forest variable importance measures: Illustrations, sources and a solution.BMC Bioinformatics, 8:25,

work page doi:10.1016/j.inffus.2021.11.011 2021

[18] [18]

33 Dukhovny and Lange Laura Tolosi and Thomas Lengauer

doi: 10.1186/1471-2105-8-25. 33 Dukhovny and Lange Laura Tolosi and Thomas Lengauer. Classification with correlated features: unreliability of feature ranking and solutions.Bioinformatics, 27 14:1986–94,

work page doi:10.1186/1471-2105-8-25 1986