Consistency of Honest Decision Trees and Random Forests

Martin Bladt; Rasmus Frigaard Lemvig

arxiv: 2601.14991 · v2 · pith:BKTN6QKRnew · submitted 2026-01-21 · 📊 stat.ME · stat.ML

Consistency of Honest Decision Trees and Random Forests

Martin Bladt , Rasmus Frigaard Lemvig This is my paper

Pith reviewed 2026-05-21 15:51 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords decision treesrandom forestsconsistencyregression estimationconvergencehonest treespartitioning methods

0 comments

The pith

Honest decision trees and random forests converge to the true regression function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that honest decision trees and their random forest averages are consistent estimators for the regression function. Separating the data used to choose splits from the data used to estimate the average response in each leaf lets the proofs follow the same steps as classical kernel smoothing arguments. This yields weak convergence, almost sure convergence, and uniform convergence over compact sets under mild conditions on the target function and the data distribution. The same framework covers subsampling ensembles and a two-stage bootstrap scheme while recovering earlier results as special cases.

Core claim

Under mild regularity conditions on the regression function and data distribution, honest trees and honest forest averages converge weakly and almost surely to the true regression function, and moreover uniform convergence holds over compact covariate domains.

What carries the argument

Honesty in tree construction, which uses one subsample to determine the partitions and a disjoint subsample to compute the response average inside each terminal node.

If this is right

The framework covers ensemble methods that rely on subsampling.
A two-stage bootstrap sampling scheme fits inside the same arguments.
Several earlier consistency statements for trees and forests become direct corollaries.
Data-adaptive partitioning is shown to behave like kernel smoothing for asymptotic purposes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Honesty may be required for consistency in other adaptive partitioning schemes beyond trees.
Uniform convergence results could be used to construct valid confidence bands around tree predictions.
The elementary smoothing-style proofs might simplify analysis of additional splitting-based estimators.

Load-bearing premise

The trees must keep the splitting sample separate from the estimation sample inside each node.

What would settle it

A regression function and data distribution where a non-honest tree or forest fails to converge almost surely to the true function.

read the original abstract

We study various types of consistency of honest decision trees and random forests in the regression setting. In contrast to related literature, our proofs are elementary and follow the classical arguments used for smoothing methods. Under mild regularity conditions on the regression function and data distribution, we establish weak and almost sure convergence of honest trees and honest forest averages to the true regression function, and moreover we obtain uniform convergence over compact covariate domains. The framework naturally accommodates ensemble variants based on subsampling and also a two-stage bootstrap sampling scheme. Our treatment synthesizes and simplifies existing analyses, in particular recovering several results as special cases. The elementary nature of the arguments clarifies the close relationship between data-adaptive partitioning and kernel-type methods, providing an accessible approach to understanding the asymptotic behavior of tree-based methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Elementary proofs for honest tree and forest consistency that recover known results, but uniform convergence over random partitions looks like the part needing extra checks.

read the letter

The main thing here is that the paper supplies elementary proofs for weak, almost sure, and uniform consistency of honest decision trees and honest forest averages under mild conditions on the regression function and data distribution. It adapts classical smoothing arguments, recovers several existing results as special cases, and extends naturally to subsampling ensembles and a two-stage bootstrap scheme. This makes the connection between data-adaptive partitioning and kernel-type methods clearer than in some heavier treatments. The honesty condition is used throughout to get conditional unbiasedness inside nodes, which lets the bias-variance decomposition go through in a straightforward way. That part of the argument looks clean for pointwise convergence. The softer spot is the uniform convergence claim over compact covariate domains. Because the partitions are random and data-dependent, controlling the supremum requires some uniform law of large numbers or maximal inequality that holds simultaneously over the class of admissible partitions. The abstract describes the proofs as following classical smoothing lines, but if the write-up moves from pointwise or conditional results to the uniform statement without an explicit covering, chaining, or epsilon-net step, that transition could be the weakest link. The stress-test note flags exactly this issue, and it seems worth pressing in review. This work is for statisticians and machine-learning researchers who already use trees and want an accessible theoretical account rather than a brand-new framework. A reader who cares about reliable guarantees for these estimators will get value from the synthesis. It deserves a serious referee because the claims matter for practice and the elementary style lowers the barrier, even if the uniform part needs tighter justification.

Referee Report

1 major / 2 minor

Summary. The paper claims to establish weak, almost sure, and uniform consistency of honest decision trees and honest random forest averages to the true regression function m(x) under mild regularity conditions on m and the data distribution. Proofs are presented as elementary and following classical smoothing-method arguments; honesty (separate samples for splitting and estimation) is the key structural assumption. The framework also covers subsampling and two-stage bootstrap variants, recovering prior results as special cases.

Significance. If the derivations hold, the work supplies an accessible, unified treatment that clarifies the link between data-adaptive partitioning and kernel-type estimators. The elementary style and recovery of known results as special cases are genuine strengths for readers seeking intuition rather than the most general technical conditions.

major comments (1)

[Abstract and the section deriving uniform convergence] The uniform convergence claim (stated in the abstract and developed in the main consistency arguments) requires a uniform law of large numbers or maximal inequality that holds simultaneously over the random, data-dependent collection of admissible partitions. The classical smoothing-style bias-variance decomposition is applied conditionally on the partition, but the passage to sup_x |hat m_n(x) - m(x)| -> 0 appears to rely only on continuity of m and pointwise convergence without an explicit covering-number, chaining, or epsilon-net argument controlling bin-diameter variation uniformly. This step is load-bearing for the uniform result.

minor comments (2)

[Notation and assumptions] Clarify the precise measurability conditions needed when taking suprema over data-dependent partitions; a short remark on the sigma-algebra generated by the splitting sample would help.
[Extensions] The statement that the framework 'naturally accommodates' subsampling and bootstrap variants would benefit from a short dedicated paragraph showing how the honesty condition is preserved under these schemes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive overall assessment and for the detailed comment on the uniform convergence argument. We address the concern directly below and agree that additional explicit steps will strengthen the presentation.

read point-by-point responses

Referee: [Abstract and the section deriving uniform convergence] The uniform convergence claim (stated in the abstract and developed in the main consistency arguments) requires a uniform law of large numbers or maximal inequality that holds simultaneously over the random, data-dependent collection of admissible partitions. The classical smoothing-style bias-variance decomposition is applied conditionally on the partition, but the passage to sup_x |hat m_n(x) - m(x)| -> 0 appears to rely only on continuity of m and pointwise convergence without an explicit covering-number, chaining, or epsilon-net argument controlling bin-diameter variation uniformly. This step is load-bearing for the uniform result.

Authors: We thank the referee for highlighting this important technical point. In the proof of uniform consistency, we first establish that the diameters of the cells in the honest partitions converge to zero in probability (and almost surely under stronger conditions) uniformly over the compact domain; this follows from the honesty assumption together with standard properties of the splitting rule and the mild regularity conditions on the covariate distribution. Uniform continuity of m on the compact set then yields uniform vanishing of the bias term. For the stochastic term, the classical smoothing decomposition is applied conditionally on the realized partition, after which we pass to the unconditional probability by using that the number of leaves is controlled (via the tree depth or subsampling fraction) and that each cell contains a growing number of estimation points. While this structure implicitly controls the supremum, we agree that an explicit maximal inequality or discretization argument would make the passage from pointwise to uniform fully rigorous and transparent. We will therefore add a short lemma in the uniform-convergence section that supplies the required uniform bound over the data-dependent collection of partitions, using a simple covering of the compact domain. This constitutes a clarification rather than a change in the main results. revision: yes

Circularity Check

0 steps flagged

No circularity: proofs adapt classical smoothing arguments using honesty as independent structure

full rationale

The paper derives weak, almost sure, and uniform consistency of honest trees and forests by following elementary bias-variance decompositions from classical kernel smoothing, with the honesty condition (separate samples for splitting and estimation) supplying the key structural property that enables conditional unbiasedness and standard convergence arguments. No step reduces a claimed prediction or result to a fitted quantity defined from the same data by construction, and the treatment synthesizes prior analyses without load-bearing self-citations that would make the central claims equivalent to their inputs. The uniform convergence claim over compact domains is presented as following from mild regularity conditions on the regression function and data distribution rather than any self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard nonparametric regression assumptions plus the honesty property of the partitioning; no free parameters or new entities are introduced.

axioms (1)

domain assumption Mild regularity conditions on the regression function and data distribution
Invoked to obtain weak, almost sure, and uniform convergence.

pith-pipeline@v0.9.0 · 5652 in / 1032 out tokens · 40262 ms · 2026-05-21T15:51:54.637962+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Analysis of purely random forests bias

ARLOT, S. and GENUER, R. Analysis of purely random forests bias.arXiv:1407.3939

work page internal anchor Pith review Pith/arXiv arXiv
[2]

and IMBENS, G

ATHEY, S. and IMBENS, G. Recursive partitioning for heterogeneous causal effects.Proceedings of the National Academy of Sciences113 7353-7360

work page
[3]

Analysis of a Random Forest Model.Journal of Machine Learning Research131063-1095

BIAU, G. Analysis of a Random Forest Model.Journal of Machine Learning Research131063-1095

work page
[4]

and LUGOSI, G

BIAU, G., DEVROYE, L. and LUGOSI, G. Consistency of Random Forests and Other Averaging Classifiers.Journal of Machine Learning Research92015–2033

work page 2033
[5]

I.Measure Theory1, 1 ed

BOGACHEV, V. I.Measure Theory1, 1 ed. Springer Berlin, Heidelberg

work page
[6]

Random Forests.Machine Learning455-32

BREIMAN, L. Random Forests.Machine Learning455-32

work page
[7]

Bagging Predictors.Machine Learning24123-140

BREIMAN, L. Bagging Predictors.Machine Learning24123-140

work page
[8]

BREIMAN, L., FRIEDMAN, J., OLSHEN, R. A. and STONE, C. J.Classification and Regression Trees, 1 ed. Chapman and Hall/CRC

work page
[9]

D., CHANDAK, R

CATTANEO, M. D., CHANDAK, R. and KLUSOWSKI, J. M. Convergence Rates of Oblique Regression Trees for Flexible Function Libraries. The Annals of Statistics52466-490

work page
[10]

Bounds for the uniform deviation of empirical measures.Journal of Multivariate Analysis1272-79

DEVROYE, L. Bounds for the uniform deviation of empirical measures.Journal of Multivariate Analysis1272-79

work page
[11]

and LUGOSI, G.A Probabilistic Theory of Pattern Recognition

DEVROYE, L., GYÖRFI, L. and LUGOSI, G.A Probabilistic Theory of Pattern Recognition. Springer-Verlag New York Inc

work page
[12]

and SCORNET, E

DUROUX, R. and SCORNET, E. Impact of subsampling and tree depth on random forests.ESAIM: Probability and Statistics2296-128

work page
[13]

Bootstrap Methods: Another Look at the Jackknife.The Annals of Statistics71-26

EFRON, B. Bootstrap Methods: Another Look at the Jackknife.The Annals of Statistics71-26

work page
[14]

Variance Reduction in Purely Random Forests.Journal of Nonparametric Statistics24543–562

GENUER, R. Variance Reduction in Purely Random Forests.Journal of Nonparametric Statistics24543–562

work page
[15]

John Wiley & Sons Inc

GHOSH, S.Kernel Smoothing: Principles, methods and applications, 1 ed. John Wiley & Sons Inc

work page
[16]

and KOGALUR, U

ISHWARAN, H. and KOGALUR, U. B. Consistency of random survival forests.Statistics and Probability Letters801056-1064

work page
[17]

B., BLACKSTONE, E

ISHWARAN, H., KOGALUR, U. B., BLACKSTONE, E. H. and LAUER, M. S. Random Survival Forests.The Annals of Applied Statistics2 841-860

work page
[18]

KLUSOWSKI, J. M. Sharp Analysis of a Simple Model for Random Forests.Proceedings of The 24th International Conference on Artificial Intelligence and Statistics130757-765

work page
[19]

and JEON, Y

LIN, Y. and JEON, Y. Random Forests and Adaptive Nearest Neighbors.Journal of the American Statistical Association101578-590

work page
[20]

MEINSHAUSEN, N. (2006). Quantile Regression Forests.Journal of Machine Learning Research7983-999

work page 2006
[21]

and HOOKER, G

MENTCH, L. and HOOKER, G. Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests.Journal of Machine Learning Research171-41

work page
[22]

General formulas for the central and non-central moments of the multinomial distribution

OUIMET, F. General formulas for the central and non-central moments of the multinomial distribution

work page
[23]

and MENTCH, L

PENG, W., COLEMAN, T. and MENTCH, L. (2022). Rates of convergence for random forests via generalized U-statistics.Electronic Journal of Statistics16232-292

work page 2022
[24]

McGraw-Hill Book Co

RUDIN, W.Real and complex analysis, 3 ed. McGraw-Hill Book Co

work page
[25]

On the asymptotics of random forests.Journal of Multivariate Analysis14672-83

SCORNET, E. On the asymptotics of random forests.Journal of Multivariate Analysis14672-83

work page
[26]

Random forests and kernel methods.IEEE Transactions on Information Theory621485-1500

SCORNET, E. Random forests and kernel methods.IEEE Transactions on Information Theory621485-1500. [27]VAN DERVAART, A. W. and WELLNER, J. A.Weak Convergence and Empirical Processes. Springer

work page
[27]

and ATHEY, S

WAGER, S. and ATHEY, S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.Journal of the American Statistical Association1131228–1242

work page
[28]

and EFRON, B

WAGER, S., HASTIE, T. and EFRON, B. Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.Journal of Machine Learning Research151625-1651

work page
[29]

WU, C. F. J. Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis.The Annals of Statistics141261-1295

work page
[30]

and NORDMAN, D

ZHANG, H., ZIMMERMAN, J., NETTLETON, D. and NORDMAN, D. J. (2020). Random Forest Prediction Intervals.The American Statistician 74392-406. Consistency of Honest Decision Trees and Random Forests35 Appendix A: Additional results A.1. A concentration inequality This appendix contains additional helpful results not included in the main text. Let(Ω,F,P)denote...

work page 2020

[1] [1]

Analysis of purely random forests bias

ARLOT, S. and GENUER, R. Analysis of purely random forests bias.arXiv:1407.3939

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

and IMBENS, G

ATHEY, S. and IMBENS, G. Recursive partitioning for heterogeneous causal effects.Proceedings of the National Academy of Sciences113 7353-7360

work page

[3] [3]

Analysis of a Random Forest Model.Journal of Machine Learning Research131063-1095

BIAU, G. Analysis of a Random Forest Model.Journal of Machine Learning Research131063-1095

work page

[4] [4]

and LUGOSI, G

BIAU, G., DEVROYE, L. and LUGOSI, G. Consistency of Random Forests and Other Averaging Classifiers.Journal of Machine Learning Research92015–2033

work page 2033

[5] [5]

I.Measure Theory1, 1 ed

BOGACHEV, V. I.Measure Theory1, 1 ed. Springer Berlin, Heidelberg

work page

[6] [6]

Random Forests.Machine Learning455-32

BREIMAN, L. Random Forests.Machine Learning455-32

work page

[7] [7]

Bagging Predictors.Machine Learning24123-140

BREIMAN, L. Bagging Predictors.Machine Learning24123-140

work page

[8] [8]

BREIMAN, L., FRIEDMAN, J., OLSHEN, R. A. and STONE, C. J.Classification and Regression Trees, 1 ed. Chapman and Hall/CRC

work page

[9] [9]

D., CHANDAK, R

CATTANEO, M. D., CHANDAK, R. and KLUSOWSKI, J. M. Convergence Rates of Oblique Regression Trees for Flexible Function Libraries. The Annals of Statistics52466-490

work page

[10] [10]

Bounds for the uniform deviation of empirical measures.Journal of Multivariate Analysis1272-79

DEVROYE, L. Bounds for the uniform deviation of empirical measures.Journal of Multivariate Analysis1272-79

work page

[11] [11]

and LUGOSI, G.A Probabilistic Theory of Pattern Recognition

DEVROYE, L., GYÖRFI, L. and LUGOSI, G.A Probabilistic Theory of Pattern Recognition. Springer-Verlag New York Inc

work page

[12] [12]

and SCORNET, E

DUROUX, R. and SCORNET, E. Impact of subsampling and tree depth on random forests.ESAIM: Probability and Statistics2296-128

work page

[13] [13]

Bootstrap Methods: Another Look at the Jackknife.The Annals of Statistics71-26

EFRON, B. Bootstrap Methods: Another Look at the Jackknife.The Annals of Statistics71-26

work page

[14] [14]

Variance Reduction in Purely Random Forests.Journal of Nonparametric Statistics24543–562

GENUER, R. Variance Reduction in Purely Random Forests.Journal of Nonparametric Statistics24543–562

work page

[15] [15]

John Wiley & Sons Inc

GHOSH, S.Kernel Smoothing: Principles, methods and applications, 1 ed. John Wiley & Sons Inc

work page

[16] [16]

and KOGALUR, U

ISHWARAN, H. and KOGALUR, U. B. Consistency of random survival forests.Statistics and Probability Letters801056-1064

work page

[17] [17]

B., BLACKSTONE, E

ISHWARAN, H., KOGALUR, U. B., BLACKSTONE, E. H. and LAUER, M. S. Random Survival Forests.The Annals of Applied Statistics2 841-860

work page

[18] [18]

KLUSOWSKI, J. M. Sharp Analysis of a Simple Model for Random Forests.Proceedings of The 24th International Conference on Artificial Intelligence and Statistics130757-765

work page

[19] [19]

and JEON, Y

LIN, Y. and JEON, Y. Random Forests and Adaptive Nearest Neighbors.Journal of the American Statistical Association101578-590

work page

[20] [20]

MEINSHAUSEN, N. (2006). Quantile Regression Forests.Journal of Machine Learning Research7983-999

work page 2006

[21] [21]

and HOOKER, G

MENTCH, L. and HOOKER, G. Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests.Journal of Machine Learning Research171-41

work page

[22] [22]

General formulas for the central and non-central moments of the multinomial distribution

OUIMET, F. General formulas for the central and non-central moments of the multinomial distribution

work page

[23] [23]

and MENTCH, L

PENG, W., COLEMAN, T. and MENTCH, L. (2022). Rates of convergence for random forests via generalized U-statistics.Electronic Journal of Statistics16232-292

work page 2022

[24] [24]

McGraw-Hill Book Co

RUDIN, W.Real and complex analysis, 3 ed. McGraw-Hill Book Co

work page

[25] [25]

On the asymptotics of random forests.Journal of Multivariate Analysis14672-83

SCORNET, E. On the asymptotics of random forests.Journal of Multivariate Analysis14672-83

work page

[26] [26]

Random forests and kernel methods.IEEE Transactions on Information Theory621485-1500

SCORNET, E. Random forests and kernel methods.IEEE Transactions on Information Theory621485-1500. [27]VAN DERVAART, A. W. and WELLNER, J. A.Weak Convergence and Empirical Processes. Springer

work page

[27] [27]

and ATHEY, S

WAGER, S. and ATHEY, S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.Journal of the American Statistical Association1131228–1242

work page

[28] [28]

and EFRON, B

WAGER, S., HASTIE, T. and EFRON, B. Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.Journal of Machine Learning Research151625-1651

work page

[29] [29]

WU, C. F. J. Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis.The Annals of Statistics141261-1295

work page

[30] [30]

and NORDMAN, D

ZHANG, H., ZIMMERMAN, J., NETTLETON, D. and NORDMAN, D. J. (2020). Random Forest Prediction Intervals.The American Statistician 74392-406. Consistency of Honest Decision Trees and Random Forests35 Appendix A: Additional results A.1. A concentration inequality This appendix contains additional helpful results not included in the main text. Let(Ω,F,P)denote...

work page 2020