pith. sign in

arxiv: 2606.30837 · v1 · pith:Y2JNZNV5new · submitted 2026-06-29 · 💻 cs.LG · cs.AI· math.PR· stat.ML

A Stationary-Distribution Theory for Triplet-Based Plateau Search in Random Forest Ensemble-Size Selection

Pith reviewed 2026-07-01 06:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AImath.PRstat.ML
keywords random forestensemble sizeplateau searchstationary distributionbirth-death chaingeometric gridout-of-bag scoreMarkov chain
0
0 comments X

The pith

Triplet plateau search for random forest ensemble size reaches a stationary distribution centered at scale O(ε^{-2}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models the process of selecting the number of trees in a random forest using plateau-based tuning with geometric triplets of out-of-bag scores as a stochastic process rather than a deterministic stop. It treats the central ensemble size as a birth-death Markov chain on a geometric grid and derives its stationary distribution through local balance equations. Under a leading centered folded-normal approximation, the center of this distribution grows like one over epsilon squared as the tolerance epsilon goes to zero. The spread scales such that the relative variability remains independent of epsilon and is controlled by the scale factor and update rule. This frames the tuning as reaching an equilibrium regime whose properties can be calculated explicitly.

Core claim

The central ensemble size B_t is modeled as a birth-death Markov chain on a geometric grid, and its stationary distribution is derived through local balance. Under a leading centered folded-normal approximation, equilibrium equations are obtained for the original update rule and a symmetric modified variant, implying that the stationary center B_* = O(ε^{-2}) as ε ↓ 0. The stationary spread is also characterized: a local Gaussian approximation and a Fokker-Planck interpretation give grid-level variance constants. After conversion to the ensemble-size scale, σ_{B,*} = O(ε^{-2}), while the variance is O(ε^{-4}). The leading relative spread is independent of ε and controlled by the scale factor

What carries the argument

Birth-death Markov chain on a geometric grid whose local balance yields the stationary distribution under a leading centered folded-normal approximation

If this is right

  • The selected ensemble size grows proportionally to 1/ε² as the accuracy tolerance ε decreases.
  • The variance of the selected size grows as 1/ε⁴, so relative fluctuations remain constant regardless of how small ε becomes.
  • A symmetric variant of the update rule yields different equilibrium equations but preserves the same leading scaling.
  • The continuous approximation via Fokker-Planck equation supplies explicit constants for the grid-level variance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers of plateau search rules could tune the scale factor to reduce the constant relative spread without changing the ε scaling.
  • The same birth-death modeling approach might extend directly to other local-comparison hyperparameter searches that use geometric steps.
  • For very small ε the theory predicts that even after the hyperparameters stabilize, the chosen tree count will still wander over a wide range whose width is a fixed fraction of the center.
  • Implementations could monitor the empirical distribution of B_t over many runs to estimate the update-rule-specific spread constant in practice.

Load-bearing premise

The central ensemble size can be modeled as a birth-death Markov chain on a geometric grid whose local balance yields the stationary distribution under a leading centered folded-normal approximation.

What would settle it

Running the triplet plateau search many times on a fixed dataset and fixed ε, then checking whether the long-run histogram of chosen ensemble sizes centers near a value proportional to 1/ε² with relative spread independent of ε.

Figures

Figures reproduced from arXiv: 2606.30837 by Andrey A. Dukhovny, Andrey M. Lange.

Figure 1
Figure 1. Figure 1: Examples of random and mean trajectories of the central triplet point [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Triplet update and schematic signed-gap distributions; shaded regions represent [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Grid-level random walk and schematic stationary masses [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Relative inflation of the equilibrium ensemble size induced by the original mixed [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

The number of trees is a central computational parameter in Random Forests: increasing it reduces finite-ensemble variability but increases training and prediction cost. Plateau-based tuning adapts this parameter through local comparisons of out-of-bag scores at a geometric triplet of tree counts. After the remaining hyperparameters have stabilized, however, the central triplet point need not converge to a deterministic value; instead, it fluctuates around a stationary regime. This paper develops a stationary-distribution theory for this process. The central ensemble size $B_t$ is modeled as a birth-death Markov chain on a geometric grid, and its stationary distribution is derived through local balance. Under a leading centered folded-normal approximation, equilibrium equations are obtained for the original update rule and a symmetric modified variant, implying that the stationary center $B_*=O(\varepsilon^{-2})$ as $\varepsilon\downarrow 0$. The stationary spread is also characterized. A local Gaussian approximation and a Fokker-Planck interpretation give grid-level variance constants. After conversion to the ensemble-size scale, $\sigma_{B,*}=O(\varepsilon^{-2})$, while the variance is $O(\varepsilon^{-4})$. The leading relative spread is independent of $\varepsilon$ and controlled by the scale factor and update rule. These results interpret plateau-based Random Forest tuning as a stochastic process rather than a deterministic stopping rule.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper develops a stationary-distribution theory for triplet-based plateau search in random forest ensemble-size selection. It models the central ensemble size B_t as a birth-death Markov chain on a geometric grid, derives its stationary distribution through local balance under a leading centered folded-normal approximation, and obtains equilibrium equations implying the stationary center B_* = O(ε^{-2}) as ε ↓ 0. It further characterizes the stationary spread via local Gaussian and Fokker-Planck approximations, yielding σ_{B,*} = O(ε^{-2}) and variance O(ε^{-4}), with the leading relative spread independent of ε and controlled by the scale factor and update rule.

Significance. If the local-balance derivation and folded-normal approximation hold with controlled error as ε ↓ 0, the results would provide a rigorous stochastic-process interpretation of plateau-based ensemble-size tuning, replacing deterministic stopping rules with explicit scaling laws for center and spread; this could guide practical hyperparameter selection in random forests by quantifying expected fluctuations.

major comments (2)
  1. [Abstract, paragraph 2] Abstract, paragraph 2: the leading centered folded-normal approximation is used to close the local-balance equations and extract B_* ∼ c ε^{-2}, but no error bounds or asymptotic analysis of the approximation error are supplied; an O(ε) bias in the mode location would change the claimed scaling, making this load-bearing for the central O(ε^{-2}) result.
  2. [Abstract] Abstract: equilibrium equations are stated to be obtained for both the original update rule and a symmetric variant, yet the manuscript provides neither the explicit transition probabilities p_{k,k+1}(B) nor a validation that the folded-normal ansatz reproduces the exact detailed-balance stationary mass to the required order.
minor comments (1)
  1. The geometric grid {B(1+ε)^k} and the precise definition of the scale factor ε should be introduced with an explicit equation before the Markov-chain construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on the manuscript. The points raised concern the rigor of the approximation and the explicitness of the derivations. We address each major comment below and commit to revisions that strengthen the presentation without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract, paragraph 2] Abstract, paragraph 2: the leading centered folded-normal approximation is used to close the local-balance equations and extract B_* ∼ c ε^{-2}, but no error bounds or asymptotic analysis of the approximation error are supplied; an O(ε) bias in the mode location would change the claimed scaling, making this load-bearing for the central O(ε^{-2}) result.

    Authors: We agree that explicit error bounds would increase rigor. The centered folded-normal arises as the leading local limit of the birth-death chain near equilibrium under the model assumptions; the scaling B_* = O(ε^{-2}) follows from the mode location at this order. An O(ε) bias would affect only sub-leading corrections. To address the concern directly, the revised manuscript will include an asymptotic analysis of the approximation error confirming that any bias is o(ε) and does not change the leading scaling. revision: yes

  2. Referee: [Abstract] Abstract: equilibrium equations are stated to be obtained for both the original update rule and a symmetric variant, yet the manuscript provides neither the explicit transition probabilities p_{k,k+1}(B) nor a validation that the folded-normal ansatz reproduces the exact detailed-balance stationary mass to the required order.

    Authors: The transition probabilities p_{k,k+1}(B) are defined explicitly from the triplet comparison rule in Section 2 of the manuscript. The local-balance equations are closed with the folded-normal ansatz to obtain the equilibrium scaling for both update rules. We will add the explicit expressions for p_{k,k+1}(B) together with a numerical validation (or supporting lemma) showing that the ansatz reproduces the stationary mass to the order needed for the leading asymptotics. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation is self-contained under explicit modeling assumptions

full rationale

The paper models B_t explicitly as a birth-death chain on the geometric grid and derives the stationary distribution via local balance equations closed by a stated leading centered folded-normal approximation. This produces the claimed O(ε^{-2}) scaling as a direct consequence of those assumptions rather than by re-using fitted data, self-citations, or renaming prior results. No load-bearing step reduces to its own inputs by construction, and the abstract and derivation chain contain no self-citation chains or ansatzes imported from the authors' prior work. The result is therefore a standard modeling exercise whose validity rests on the accuracy of the approximation, not on circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract; the model treats the tuning process as a birth-death chain whose stationary properties are derived via approximation. ε is an external scale parameter whose origin is not detailed.

free parameters (1)
  • ε
    Small positive parameter appearing in the triplet search precision; the stationary center and spread scale as O(ε^{-2}).
axioms (2)
  • domain assumption Central ensemble size B_t is a birth-death Markov chain on a geometric grid
    Modeling choice stated in abstract paragraph 2.
  • ad hoc to paper Leading centered folded-normal approximation suffices for equilibrium equations
    Used to obtain the stationary distribution and scaling.

pith-pipeline@v0.9.1-grok · 5781 in / 1385 out tokens · 34304 ms · 2026-07-01T06:25:32.461301+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Simon Bernard, Laurent Heutte, and S´ ebastien Adam

    URLhttps://proceedings.neurips.cc/paper_files/ paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf. Simon Bernard, Laurent Heutte, and S´ ebastien Adam. Influence of hyperparameters on random forest accuracy. In J´ on Atli Benediktsson, Josef Kittler, and Fabio Roli, editors, Multiple Classifier Systems, pages 171–180, Berlin, Heidelberg,

  2. [2]

    Adversarial examples: Attacks and defenses for deep learning

    doi: 10.1109/TNNLS. 2022.3229161. Leo Breiman. Random forests.Machine Learning, 45(1):5–32,

  3. [3]

    Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes

    doi: 10.1023/A: 1010933404324. Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. Ensemble selection from libraries of models. InProceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 18, New York, NY, USA,

  4. [4]

    ISBN 1581138385

    Association for Computing Machinery. ISBN 1581138385. doi: 10.1145/1015330.1015432. URLhttps: //doi.org/10.1145/1015330.1015432. Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM,

  5. [5]

    Xgboost: A scalable tree boosting system,

    doi: 10.1145/2939672.2939785. 31 Dukhovny and Lange Alfredo Cuzzocrea, Shane Leo Francis, and Mohamed Medhat Gaber. An information- theoretic approach for setting the optimal number of decision trees in random forests. In2013 IEEE International Conference on Systems, Man, and Cybernetics, pages 1013– 1019,

  6. [6]

    Liliya Demidova and Maria Ivkina

    doi: 10.1109/SMC.2013.177. Liliya Demidova and Maria Ivkina. Approach to determining the boundaries of the search range for the number of trees in the random forest algorithm. In2020 9th Mediterranean Conference on Embedded Computing (MECO), pages 1–4,

  7. [7]

    Fiona Katharina Ewald, Ludwig Bothmann, Marvin N

    doi: 10.1109/MECO49872.2020.9134342. Fiona Katharina Ewald, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, and Gunnar K¨ onig. A guide to feature importance methods for scientific inference. In Luca Longo, Sebastian Lapuschkin, and Christin Seifert, editors,Explainable Artificial Intelligence, pages 440–464, Cham,

  8. [8]

    doi: 10.1007/978-3-031-63797-1

    Springer Nature Switzerland. doi: 10.1007/978-3-031-63797-1

  9. [9]

    Robin Genuer, Jean-Michel Poggi, and Christine Tuleau

    doi: 10.1214/aos/1013203451. Robin Genuer, Jean-Michel Poggi, and Christine Tuleau. Random forests: some method- ological insights,

  10. [10]

    Random Forests: some methodological insights

    URLhttps://arxiv.org/abs/0811.3619. L´ eo Grinsztajn, Edouard Oyallon, and Ga¨ el Varoquaux. Why do tree-based mod- els still outperform deep learning on typical tabular data? InAdvances in Neu- ral Information Processing Systems, volume 35, pages 507–520. Curran Associates, Inc.,

  11. [11]

    Daniel Hern´ andez-Lobato, Gonzalo Mart´ ınez-Mu˜ noz, and Alberto Su´ arez

    URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 0378c7692da36807bdec87ab043cdadc-Paper-Datasets_and_Benchmarks.pdf. Daniel Hern´ andez-Lobato, Gonzalo Mart´ ınez-Mu˜ noz, and Alberto Su´ arez. How large should ensembles of classifiers be?Pattern Recognition, 46(5):1323–1336,

  12. [12]

    doi: 10.1186/s12859-025-06097-1

    ISSN 1471-2105. doi: 10.1186/s12859-025-06097-1. URL https://doi.org/10.1186/s12859-025-06097-1. Patrice Latinne, Olivier Debeir, and Christine Decaestecker. Limiting the number of trees in random forests. In Josef Kittler and Fabio Roli, editors,Multiple Classifier Systems, pages 178–187, Berlin, Heidelberg,

  13. [13]

    URLhttps://doi.org/10.1214/18-BA1110

    doi: 10.1214/18-BA1110. URLhttps://doi.org/10.1214/18-BA1110. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization.Journal of Machine Learning Research, 18(185):1–52,

  14. [14]

    URLhttps://doi.org/10.1214/18-AOS1707

    doi: 10.1214/18-AOS1707. URLhttps://doi.org/10.1214/18-AOS1707. Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding vari- able importances in forests of randomized trees. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors,Advances in Neural Information Process- ing Systems, volume

  15. [15]

    doi: 10.1007/978-3-642-31537-4

  16. [16]

    URLhttps: //ieeexplore.ieee.org/document/11571780

    doi: 10.1109/ACCESS.2026.3705574. URLhttps: //ieeexplore.ieee.org/document/11571780. Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hy- perparameters of machine learning algorithms.Journal of Machine Learning Research, 20(53):1–32,

  17. [17]

    Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn

    doi: 10.1016/j.inffus.2021.11.011. Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in ran- dom forest variable importance measures: Illustrations, sources and a solution.BMC Bioinformatics, 8:25,

  18. [18]

    33 Dukhovny and Lange Laura Tolosi and Thomas Lengauer

    doi: 10.1186/1471-2105-8-25. 33 Dukhovny and Lange Laura Tolosi and Thomas Lengauer. Classification with correlated features: unreliability of feature ranking and solutions.Bioinformatics, 27 14:1986–94,