Revisiting Forest Proximities via Sparse Leaf-Incidence Kernels
Pith reviewed 2026-05-16 17:50 UTC · model grok-4.3
The pith
Decision forest proximities admit an exact sparse factorization through separable weighted leaf-collision kernels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By defining the Separable Weighted Leaf-Collision (SWLC) kernel class, the authors prove that most existing forest proximity definitions differ only in how they weight leaf collisions while sharing the same sparse incidence structure between samples and leaves. This common structure produces an exact factorization of the finite-sample proximity matrix as a product of the leaf-incidence matrix with its transpose, scaled by a diagonal matrix of leaf weights, without ever computing all pairwise comparisons.
What carries the argument
The Separable Weighted Leaf-Collision (SWLC) kernel, which expresses proximity as a sum over shared leaves with sample- and leaf-dependent weights, enabling the sparse factorization.
Load-bearing premise
That existing forest proximity definitions can be expressed exactly as separable weighted sums over leaf collisions and that the leaf-incidence matrix remains sufficiently sparse under standard forest hyperparameters.
What would settle it
A counter-example where a standard forest proximity cannot be written as a separable weighted leaf-collision kernel, or an empirical test showing quadratic scaling instead of near-linear when the number of samples grows while keeping forest size fixed.
Figures
read the original abstract
Decision forests induce supervised similarities through the partition structure of their trees. Yet forest proximity computation is still often treated as a quadratic operation in the number of samples, which limits scalability and restricts broader use in kernel and representation-learning pipelines. We introduce a unified view of leaf-collision forest proximities through a class of Separable Weighted Leaf-Collision (SWLC) kernels, showing that most existing proximities differ only in their weighting scheme while sharing a common sparse leaf-incidence structure. This yields an explicit leaf-space representation that clarifies their kernel interpretation and leads to an exact finite-sample sparse factorization of the proximity matrix, avoiding an explicit all-pairs comparison and reducing computation to sparse linear algebra over leaf collisions. We implement this framework in a memory-efficient Python library and show, both theoretically and empirically, that exact kernel computation scales near-linearly in time and memory under standard forest regimes. Benchmarks verify the predicted scaling behavior in practice across datasets, proximity definitions, and forest settings, and show that the resulting sparse leaf-space representation can also be used directly for fast task-aware embedding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript unifies existing forest proximity definitions under a class of Separable Weighted Leaf-Collision (SWLC) kernels. It derives an exact finite-sample factorization K = L W L^T of the proximity matrix from the sparse leaf-incidence matrix L, reducing computation to sparse linear algebra over leaf collisions without explicit all-pairs comparisons. The work provides theoretical arguments and empirical benchmarks showing near-linear time and memory scaling under standard forest regimes, along with a memory-efficient implementation and direct use of the leaf-space representation for task-aware embeddings.
Significance. If the sparsity assumptions hold, the exact factorization and its reduction to sparse linear algebra constitute a clear practical advance for scalable kernel methods and embeddings based on decision forests. The parameter-free derivation from the leaf-incidence definition and the reproducible scaling benchmarks are notable strengths that clarify the kernel view of forest proximities.
major comments (2)
- [theoretical analysis and scaling discussion] The central scalability claim (near-linear time/memory via sparse factorization) rests on the leaf-incidence matrix L remaining sufficiently sparse. No explicit bound is given on nnz(L) as a function of n_samples, max_depth, or n_estimators that would guarantee the claimed asymptotic regime outside narrow 'standard' hyperparameter settings.
- [unified view of SWLC kernels] The weakest assumption—that existing proximities are exactly expressible as separable weighted sums over leaf collisions—is used to justify the unified SWLC class, yet the manuscript does not provide a counter-example or proof sketch showing that all common proximity variants (e.g., those with non-separable terms) fall inside this class.
minor comments (2)
- [factorization derivation] The definition of the weighting matrix W should be stated explicitly with its dimensions and construction rule immediately after the factorization equation.
- [empirical benchmarks] Figure captions for the scaling plots should include the exact hyperparameter ranges tested (depth, n_estimators, dataset sizes) to allow direct verification of the 'standard regimes' claim.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and the opportunity to clarify the theoretical foundations of our work. We address the major comments point by point below.
read point-by-point responses
-
Referee: The central scalability claim (near-linear time/memory via sparse factorization) rests on the leaf-incidence matrix L remaining sufficiently sparse. No explicit bound is given on nnz(L) as a function of n_samples, max_depth, or n_estimators that would guarantee the claimed asymptotic regime outside narrow 'standard' hyperparameter settings.
Authors: We appreciate this observation. In decision forests, each sample is assigned to exactly one leaf per tree, so the leaf-incidence matrix L has exactly nnz(L) = n_samples x n_estimators nonzeros, independent of max_depth. This bound is linear in the number of samples and estimators. We will add an explicit statement of this bound and a discussion of the conditions under which the near-linear scaling holds in the revised manuscript. revision: yes
-
Referee: The weakest assumption—that existing proximities are exactly expressible as separable weighted sums over leaf collisions—is used to justify the unified SWLC class, yet the manuscript does not provide a counter-example or proof sketch showing that all common proximity variants (e.g., those with non-separable terms) fall inside this class.
Authors: The manuscript focuses on standard proximity definitions (e.g., Breiman's original and common variants), which we show are exactly SWLC kernels by deriving their weighting schemes explicitly. We do not claim universality over all conceivable variants; non-separable terms would indeed lie outside the class. To clarify the scope, we will include a short discussion noting that the SWLC class encompasses the separable weighted leaf-collision proximities prevalent in the literature, with a brief example of a non-separable variant that falls outside. revision: yes
Circularity Check
No circularity: factorization follows directly from leaf-incidence definition
full rationale
The paper defines SWLC kernels from the standard leaf-collision structure of decision forests and derives the sparse factorization K = L W L^T as an algebraic identity on the leaf-incidence matrix L. This is a direct rewriting of the proximity definition rather than a self-referential loop, fitted parameter renamed as prediction, or load-bearing self-citation. No step reduces the claimed result to its own inputs by construction; the derivation remains self-contained against the external definition of forest proximities.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Leaf-incidence matrix between samples and leaves is sparse under standard forest construction
- domain assumption Proximity definitions can be written as separable weighted sums over leaf collisions
invented entities (1)
-
Separable Weighted Leaf-Collision (SWLC) kernel
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
L. Breiman. Random forests.Machine learning, 45(1):5–32, 2001
work page 2001
-
[3]
L. Breiman and A. Cutler. Manual on setting up, using, and understanding random forests v4.0. Technical report, University of California, Berkeley, Department of Statis- tics, 2003
work page 2003
-
[4]
L. Duvaux, Q. Geissmann, K. Gharbi, J.-J. Zhou, J. Ferrari, C. M. Smadja, and R. K. Butlin. Dynamics of Copy Number Variation in Host Races of the Pea Aphid. Molecular Biology and Evolution, 32(1):63–80, Jan. 2015. ISSN 1537-1719, 0737-4038. doi: 10.1093/molbev/msu266
-
[5]
S. Farokhi, H. Chen, K. Moon, and H. Karimi. Advancing Tabular Data Classification with Graph Neural Networks: A Random Forest Proximity Method. In2024 IEEE International Conference on Big Data (BigData), pages 7011–7020, Dec. 2024. doi: 10.1109/BigData62323.2024.10825972
-
[6]
J. H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine.The Annals of Statistics, 29(5):1189–1232, 2001. ISSN 0090-5364
work page 2001
-
[7]
Extremely randomized trees.Machine Learning, 63(1):3–42, 2006
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees.Machine Learning, 63(1):3–42, Apr. 2006. ISSN 1573-0565. doi: 10.1007/s10994-006-6226-1
-
[8]
F. G. Gustavson. Two Fast Algorithms for Sparse Matrices: Multiplication and Per- muted Transposition.ACM Transactions on Mathematical Software, 4(3):250–269, Sept. 1978. ISSN 0098-3500, 1557-7295. doi: 10.1145/355791.355796
-
[9]
K. Hassine, A. Erbad, and R. Hamila. Important Complexity Reduction of Random Forest in Multi-Classification Problem. In2019 15th International Wireless Commu- nications & Mobile Computing Conference (IWCMC), pages 226–231, June 2019. doi: 10.1109/IWCMC.2019.8766544
-
[10]
J. Lee. Patient-Specific Predictive Modeling Using Random Forests: An Observational Study for the Critically Ill.JMIR Medical Informatics, 5(1):e6690, Jan. 2017. doi: 10.2196/medinform.6690
-
[11]
G. Louppe. Understanding Random Forests: From Theory to Practice, June 2015
work page 2015
-
[12]
J. S. Rhodes and A. G. Rustad. Random Forest-Supervised Manifold Alignment. In 2024 IEEE International Conference on Big Data (BigData), pages 3309–3312. IEEE Computer Society, Dec. 2024. ISBN 979-8-3503-6248-0. doi: 10.1109/BigData62323. 2024.10825663. 7 Aumon, Wolf, Moon and Rhodes
-
[13]
J. S. Rhodes, A. Aumon, S. Morin, M. Girard, C. Larochelle, B. Lahav, E. Brunet- Ratnasingham, W. Zhang, A. Cutler, A. Zhou, D. E. Kaufmann, S. Zandee, A. Prat, G. Wolf, and K. R. Moon. Gaining Biological Insights through Supervised Data Visu- alization, Nov. 2023
work page 2023
-
[14]
J. S. Rhodes, A. Cutler, and K. R. Moon. Geometry- and Accuracy-Preserving Random Forest Proximities.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10947–10959, Sept. 2023. ISSN 1939-3539. doi: 10.1109/TPAMI.2023.3263774
-
[15]
B. Shaw, J. Rhodes, S. F. Boubrahimi, and K. R. Moon. Forest Proximities for Time Series. In K. Arai, editor,Intelligent Systems and Applications, pages 588–598, Cham, 2026. Springer Nature Switzerland. ISBN 978-3-032-07109-5. doi: 10.1007/978-3-032-07109-5 40
-
[16]
S. Tan, M. Soloviev, G. Hooker, and M. T. Wells. Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable. InProceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, FODS ’20, pages 23–34, New York, NY, USA, Oct. 2020. Association for Computing Machinery. ISBN 978-1-4503-8103-1. doi: 10.1145/3412815.3416893
-
[17]
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, ˙I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perk- told, R. Cimrman, I. Henrikse...
-
[18]
doi: 10.1038/s41592-019-0686-2
-
[19]
H. Xiao, K. Rasul, and R. Vollgraf. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms, Sept. 2017. 8 Scalable Tree Ensemble Proximities Appendix A. Proofs Proof[Proposition 1] Let the columns ofQandWbe indexed byk∈ K. Since each leaf kbelongs to exactly one tree, lett(k) be the tree index containing leafk. We define the ent...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.