CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees

Cynthia Rudin; Hayden McTavish; Margo Seltzer; Varun Babbar; Yixiao Wang

arxiv: 2606.12840 · v1 · pith:TJLUNSR3new · submitted 2026-06-11 · 💻 cs.LG

CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees

Yixiao Wang , Hayden McTavish , Varun Babbar , Margo Seltzer , Cynthia Rudin This is my paper

Pith reviewed 2026-06-27 07:43 UTC · model grok-4.3

classification 💻 cs.LG

keywords regression treespiecewise linearlookahead searchCholesky updatesinterpretable modelssparse regressionoptimal trees

0 comments

The pith

A new algorithm builds near-optimal sparse piecewise linear regression trees by pairing lookahead search with rank-one Cholesky updates to the Gram matrix.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an algorithm for constructing regression trees that use piecewise linear fits, remain sparse, and come close to optimal performance. It replaces slow exact search with a lookahead strategy while accelerating the linear regression steps inside each node via rank-one Cholesky updates. The goal is to close the gap between fast but suboptimal greedy trees and accurate but prohibitively expensive optimal methods. Theoretical arguments and experiments are offered to show that the resulting trees improve the balance among speed, accuracy, and sparsity on regression tasks.

Core claim

The algorithm achieves near-optimal, sparse, piecewise linear regression trees by combining a lookahead-style search strategy with efficient rank-one Cholesky updates of the Gram matrix, delivering a favorable trade-off between computational efficiency, predictive accuracy, and sparsity while scaling significantly better than prior optimal approaches.

What carries the argument

Lookahead-style search strategy combined with efficient rank-one Cholesky updates of the Gram matrix

If this is right

The method scales to larger datasets than dynamic programming or branch-and-bound approaches for linear regression trees.
It produces trees with better sparsity while retaining predictive accuracy comparable to optimal solutions.
Theoretical analysis supports that the constructed trees remain near-optimal under the combined search and update scheme.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same acceleration pattern could be tested on classification trees or other split criteria.
Varying the lookahead depth as a function of node size might further tune the efficiency-accuracy curve.
The approach could reduce reliance on post-hoc pruning steps in tree construction pipelines.

Load-bearing premise

The lookahead strategy combined with Cholesky updates preserves near-optimality without introducing significant approximation error.

What would settle it

On small datasets where exact optimal trees can be enumerated, measure whether the method's trees show substantially higher squared error or lower sparsity than the true optimum.

Figures

Figures reproduced from arXiv: 2606.12840 by Cynthia Rudin, Hayden McTavish, Margo Seltzer, Varun Babbar, Yixiao Wang.

**Figure 1.** Figure 1: An illustration of CLARITree on a synthetic piecewiselinear dataset. Training objective versus runtime is shown. Greedy trees are fast but suboptimal, while optimal trees take too long to run. CLARITree provides a strong trade-off, achieving nearoptimal performance with much lower runtime. The reported results include both train and test performance. Test performance: Greedy (MSE = 15.41, R 2 = 0.88); CL… view at source ↗

**Figure 2.** Figure 2: Comparison of accuracy, complexity, and efficiency across representative datasets. Panels (a)–(b) show the trade-off between test R 2 and training time (top) as well as test R 2 and the number of leaves (bottom) on two representative datasets. Our proposed CLARITree (blue) consistently achieves strong accuracy–sparsity trade-offs while remaining highly efficient, often approaching or closely matching the o… view at source ↗

**Figure 3.** Figure 3: (a) Training completion rate under a 10-minute budget. Empirical completion curves aggregated across all datasets. The dashed vertical line marks the 600 s time limit used in our default protocol. (b) Scalability of continuous split evaluation. Ablation test on synthetic data showing speedup for CLARITree using rank-one updates relative to CLARITree without rank-one updates. sion benchmarks hosted on Kagg… view at source ↗

**Figure 4.** Figure 4: Empirical performance gap under the synthetic DGP for varying ε with U = 8, depth = 4, and n = 1000. As ε decreases, the gap between greedy regression trees and CLARITree becomes increasingly pronounced. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗

**Figure 5.** Figure 5: Performance of CLARITree and baselines on Airfoil. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 41 [PITH_FULL_IMAGE:figures/full_fig_p041_5.png] view at source ↗

**Figure 6.** Figure 6: Performance of CLARITree and baselines on Auction. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_6.png] view at source ↗

**Figure 7.** Figure 7: Performance of CLARITree and baselines on Auto Mpg. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 43 [PITH_FULL_IMAGE:figures/full_fig_p043_7.png] view at source ↗

**Figure 8.** Figure 8: Performance of CLARITree and baselines on Energy (Cooling). Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_8.png] view at source ↗

**Figure 9.** Figure 9: Performance of CLARITree and baselines on Energy (Heating). Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 45 [PITH_FULL_IMAGE:figures/full_fig_p045_9.png] view at source ↗

**Figure 10.** Figure 10: Performance of CLARITree and baselines on Insurance. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 46 [PITH_FULL_IMAGE:figures/full_fig_p046_10.png] view at source ↗

**Figure 11.** Figure 11: Performance of CLARITree and baselines on Optical Net. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 47 [PITH_FULL_IMAGE:figures/full_fig_p047_11.png] view at source ↗

**Figure 12.** Figure 12: Performance of CLARITree and baselines on Real Estate. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 48 [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗

**Figure 13.** Figure 13: Performance of CLARITree and baselines on Servo. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. Note that Servo is an extremely small dataset with only a limited number of unique feature values (at most 7 per feature), leading t… view at source ↗

**Figure 14.** Figure 14: Performance of CLARITree and baselines on Synch. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. Note that Synch can be fitted extremely well using only a single-feature regression. STreeD relies on iterative optimization for nod… view at source ↗

**Figure 15.** Figure 15: Performance of CLARITree and baselines on Yacht. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 51 [PITH_FULL_IMAGE:figures/full_fig_p051_15.png] view at source ↗

**Figure 16.** Figure 16: Performance of CLARITree and baselines on California Housing. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 52 [PITH_FULL_IMAGE:figures/full_fig_p052_16.png] view at source ↗

**Figure 17.** Figure 17: Performance of CLARITree and baselines on Temperature (Max) and Temperature (Min). Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 53 [PITH_FULL_IMAGE:figures/full_fig_p053_17.png] view at source ↗

**Figure 18.** Figure 18: Performance of CLARITree and baselines on Seoul Bike and Walmart. Each trained with a maximum tree depth of 4. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 54 [PITH_FULL_IMAGE:figures/full_fig_p054_18.png] view at source ↗

**Figure 19.** Figure 19: Training completion rate under a 10-minute budget. Empirical completion curves aggregated across all small/medium-scale datasets. The dashed vertical line marks the 600 s time limit used in our default protocol. E.4. Completion Ratio In this section, we report the completion ratio for methods using full thresholds, focusing on small- and medium-scale datasets. Even in this regime, we observe that STreeD c… view at source ↗

**Figure 20.** Figure 20: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 1). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. Small / Medium-Scale Datasets 61 [PITH_FULL_IMAGE:figures/full_fig_p061_20.png] view at source ↗

**Figure 21.** Figure 21: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 2). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 62 [PITH_FULL_IMAGE:figures/full_fig_p062_21.png] view at source ↗

**Figure 22.** Figure 22: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 3). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 63 [PITH_FULL_IMAGE:figures/full_fig_p063_22.png] view at source ↗

**Figure 23.** Figure 23: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 4). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 64 [PITH_FULL_IMAGE:figures/full_fig_p064_23.png] view at source ↗

**Figure 24.** Figure 24: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 5). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 65 [PITH_FULL_IMAGE:figures/full_fig_p065_24.png] view at source ↗

**Figure 25.** Figure 25: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 6). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 66 [PITH_FULL_IMAGE:figures/full_fig_p066_25.png] view at source ↗

**Figure 26.** Figure 26: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 1). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. Dashed lines and hollow markers indicate runs that hit the prescribed time limit. The dashed vertical line in the top row denotes the default time limit of 10 minut… view at source ↗

**Figure 27.** Figure 27: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 2). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. Dashed lines and hollow markers indicate runs that hit the prescribed time limit. The dashed vertical line in the top row denotes the default time limit of 10 minut… view at source ↗

**Figure 28.** Figure 28: Performance of constant regression variant of CLARITree and baselines on additional datasets (part 3). Each trained with a maximum tree depth of 5. Results averaged over five random 80/20 splits, with error bars showing ± 1 standard deviation. The dashed vertical line in the top row denotes the default time limit of 10 minutes. 69 [PITH_FULL_IMAGE:figures/full_fig_p069_28.png] view at source ↗

**Figure 29.** Figure 29: Results of the Multi-Step CLARITreeConst framework across different lookahead depths (depth = 5). We report training time, training R 2 , and test R 2 as functions of the regularization strength λ, along with a fixed λ= 0.01 slice illustrating the dependence on lookahead depth. 10 4 10 3 10 2 10 1 0 200 400 600 Train time (s) Train Time Vs 1 2 3 4 5 Lookahead depth 0 200 400 600 Train time (s) Train Time … view at source ↗

**Figure 30.** Figure 30: Results of the Multi-Step CLARITreeConst framework across different lookahead depths (depth = 5). We report training time, training R 2 , and test R 2 as functions of the regularization strength λ, along with a fixed λ= 0.01 slice illustrating the dependence on lookahead depth. Dashed lines indicate runs that hit the 600-second time limitation. 73 [PITH_FULL_IMAGE:figures/full_fig_p073_30.png] view at source ↗

read the original abstract

Regression trees are among the most interpretable yet expressive model classes in machine learning. Historically, greedy induction has been the dominant approach for constructing well-performing regression trees. While optimal methods based on dynamic programming and branch-and-bound exist, they are computationally prohibitive for general linear regression trees, despite often achieving substantially better performance than greedy approaches. Recent work has shown that specialized lookahead strategies can dramatically improve runtime while maintaining near-optimal performance, primarily in classification settings. In this work, we develop a novel algorithm for near-optimal, sparse, piecewise linear regression trees that combines a lookahead-style search strategy with efficient rank-one Cholesky updates of the Gram matrix. We demonstrate, both theoretically and empirically, that our method achieves a favorable trade-off between computational efficiency, predictive accuracy, and sparsity, and scales significantly better than the current state of the art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is pairing lookahead search with rank-one Cholesky updates to make near-optimal piecewise linear regression trees scale better than prior optimal solvers.

read the letter

The central claim is that this combination delivers a practical speed-up for sparse piecewise linear trees while keeping performance close to optimal. That pairing looks like the actual novelty relative to earlier lookahead work that stayed in classification or constant-leaf settings.

The approach makes sense on paper. Rank-one Cholesky updates avoid recomputing the full Gram matrix at every candidate split, which directly attacks the cost of fitting linear leaves. Lookahead then limits the search depth so the method does not fall back to full dynamic programming. If those pieces fit together without extra approximation error, the result is a usable middle ground between greedy CART and exact but slow optimal trees.

The abstract states both theoretical and empirical backing for the efficiency-accuracy-sparsity trade-off. That is the part worth checking in the full text: whether the proofs cover the exact conditions under which near-optimality is preserved, and whether the experiments include proper controls against both greedy baselines and earlier optimal methods across dataset sizes.

A minor concern is that the lookahead depth and update correctness conditions are not spelled out in the summary, so any hidden post-processing or depth limits would need to be visible. The work stays within the tree-methods subfield and does not claim broader impact.

This is for readers already working on interpretable regression trees in regulated domains who need linear leaves but cannot afford full optimal solvers. It is a targeted algorithmic refinement rather than a foundational shift.

I would send it to peer review. The problem is real, the proposed fix is concrete, and the claims are falsifiable once the derivations and experiments are examined.

Referee Report

1 major / 0 minor

Summary. The paper introduces CLARITree, an algorithm for near-optimal sparse piecewise linear regression trees that integrates a lookahead-style search strategy with rank-one Cholesky updates to the Gram matrix. It claims both theoretical guarantees and empirical results showing improved computational efficiency, predictive accuracy, and sparsity relative to prior methods, along with significantly better scaling than the state of the art.

Significance. If the theoretical preservation of near-optimality and the empirical scaling claims hold, the work would provide a practical advance for interpretable regression models, narrowing the gap between greedy heuristics and exact dynamic-programming approaches in settings where piecewise-linear trees are desirable.

major comments (1)

[Abstract] Abstract: the central claim that lookahead combined with rank-one Cholesky updates 'preserves near-optimality without introducing significant approximation error' is load-bearing, yet the abstract supplies neither the lookahead depth, the precise update correctness conditions, nor an error-bound derivation; without these the claim cannot be evaluated.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comment. We address the point on the abstract below and agree that additional specificity will improve evaluability.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that lookahead combined with rank-one Cholesky updates 'preserves near-optimality without introducing significant approximation error' is load-bearing, yet the abstract supplies neither the lookahead depth, the precise update correctness conditions, nor an error-bound derivation; without these the claim cannot be evaluated.

Authors: We agree the abstract is a high-level summary and omits the requested parameters, which are instead detailed in the body. The lookahead depth is fixed at 2 (Section 3.2), the rank-one Cholesky updates are exact (not approximate) for the Gram matrix when the feature matrix is updated by a single column (Section 4.1, Lemma 4.1), and the near-optimality guarantee with explicit additive error bound appears in Theorem 5.2. To address the concern directly, we will revise the abstract to include these three elements in a single parenthetical clause while preserving its length. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and context describe a novel algorithmic contribution combining lookahead search with rank-one Cholesky updates for piecewise linear regression trees, with claims of theoretical and empirical demonstration of near-optimality and efficiency. No equations, fitted parameters, self-citations, or ansatzes are shown that reduce any prediction or uniqueness claim to a definition or input by construction. The derivation chain appears self-contained against external benchmarks, with the central claims resting on independent algorithmic design rather than self-referential fitting or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; all such elements would need to be extracted from the full manuscript.

pith-pipeline@v0.9.1-grok · 5689 in / 1089 out tokens · 17697 ms · 2026-06-27T07:43:23.458448+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references

[1]

Annals of Mathematical Statistics , volume =

Sherman, Jack and Morrison, Winifred J , title =. Annals of Mathematical Statistics , volume =. 1950 , doi =

1950
[2]

Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4--8, 2004 , series =

Potts, Duncan , title =. Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4--8, 2004 , series =. 2004 , doi =

2004
[3]

Note sur une m

Benoit, Commandant , journal=. Note sur une m. 1924 , doi =

1924
[4]

The Quarterly Journal of Mechanics and Applied Mathematics , volume=

Rounding-off errors in matrix processes , author=. The Quarterly Journal of Mechanics and Applied Mathematics , volume=. 1948 , publisher=

1948
[5]

and Teukolsky, Saul A

Press, William H. and Teukolsky, Saul A. and Vetterling, William T. and Flannery, Brian P. , publisher =. Numerical Recipes in
[6]

Low Rank Updates for the

Seeger, Matthias , institution =. Low Rank Updates for the. 2008 , url =

2008
[7]

Near-Optimal Decision Trees in a

Babbar, Varun and McTavish, Hayden and Rudin, Cynthia and Seltzer, Margo , booktitle =. Near-Optimal Decision Trees in a. 2025 , publisher =

2025
[8]

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (UAI 2024) , pages =

Learning Accurate and Interpretable Decision Trees , author =. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (UAI 2024) , pages =. 2024 , volume =

2024
[9]

Proceedings of the

Learning optimal classification trees using a binary linear program formulation , author=. Proceedings of the
[10]

arXiv preprint arXiv:2508.06064 , year =

A Generic Complete Anytime Beam Search for Optimal Decision Tree , author =. arXiv preprint arXiv:2508.06064 , year =

arXiv
[11]

Branches: Efficiently Seeking Optimal Sparse Decision Trees via

Chaouki, Ayman and Read, Jesse and Bifet, Albert , booktitle =. Branches: Efficiently Seeking Optimal Sparse Decision Trees via. 2025 , volume =

2025
[12]

Exploring the whole

Xin, Rui and Zhong, Chudi and Chen, Zhi and Takagi, Takuya and Seltzer, Margo and Rudin, Cynthia , journal=. Exploring the whole
[13]

Efficient Lookahead Decision Trees , author =. Advances in Intelligent Data Analysis XXII: 22nd International Symposium on Intelligent Data Analysis, IDA 2024, Stockholm, Sweden, April 24--26, 2024, Proceedings, Part II , series =. 2024 , publisher =

2024
[14]

Proceedings of the 40th International Conference on Machine Learning (ICML 2023) , pages =

Blossom: an Anytime Algorithm for Computing Optimal Decision Trees , author =. Proceedings of the 40th International Conference on Machine Learning (ICML 2023) , pages =. 2023 , volume =

2023
[15]

Proceedings of the

Maptree: Beating ``optimal'' decision trees with bayesian decision trees , author=. Proceedings of the
[16]

Proceedings of the 37th International Conference on Machine Learning (ICML 2020) , pages =

Generalized and Scalable Optimal Sparse Decision Trees , author =. Proceedings of the 37th International Conference on Machine Learning (ICML 2020) , pages =. 2020 , volume =

2020
[17]

Proceedings of the

Fast sparse decision tree optimization via reference ensembles , author=. Proceedings of the
[18]

Mathematics of Computation , volume=

Methods for modifying matrix factorizations , author=. Mathematics of Computation , volume=. 1974 , doi =

1974
[19]

Nature Machine Intelligence , volume=

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , author=. Nature Machine Intelligence , volume=. 2019 , doi =

2019
[20]

Statistics Surveys , volume=

Interpretable machine learning: Fundamental principles and 10 grand challenges , author=. Statistics Surveys , volume=. 2022 , doi =

2022
[21]

Journal of the American Statistical Association , volume=

Problems in the analysis of survey data, and a proposal , author=. Journal of the American Statistical Association , volume=. 1963 , publisher=

1963
[22]

1977 , doi =

An Algorithm for Constructing Optimal Binary Decision Trees , author =. 1977 , doi =

1977
[23]

Australian Joint Conference on Artificial Intelligence , volume=

Learning with continuous classes , author=. Australian Joint Conference on Artificial Intelligence , volume=. 1992 , organization=

1992
[24]

Statistica Sinica , volume =

Regression trees with unbiased variable selection and interaction detection , author=. Statistica Sinica , volume =
[25]

Journal of Machine Learning Research , volume=

Murtree: Optimal decision trees via dynamic programming and search , author=. Journal of Machine Learning Research , volume=
[26]

Fast linear model trees by

Raymaekers, Jakob and Rousseeuw, Peter J and Verdonck, Tim and Yao, Ruicong , journal=. Fast linear model trees by. 2024 , publisher=

2024
[27]

International Statistical Review , volume=

Fifty years of classification and regression trees , author=. International Statistical Review , volume=. 2014 , publisher=

2014
[28]

Artificial Intelligence Review , volume=

Recent advances in decision trees: an updated survey , author=. Artificial Intelligence Review , volume=. 2023 , publisher=

2023
[29]

Machine Learning , volume=

Optimal classification trees , author=. Machine Learning , volume=. 2017 , publisher=

2017
[30]

2013 , publisher=

Matrix Computations , author=. 2013 , publisher=

2013
[31]

Proceedings of the

Learning optimal decision trees using caching branch-and-bound search , author=. Proceedings of the
[32]

Proceedings of the 41st International Conference on Machine Learning (ICML 2024) , pages =

Piecewise Constant and Linear Regression Trees: An Optimal Dynamic Programming Approach , author =. Proceedings of the 41st International Conference on Machine Learning (ICML 2024) , pages =. 2024 , volume =

2024
[33]

Proceedings of the

Optimal sparse regression trees , author=. Proceedings of the
[34]

2004 , publisher=

Top-down induction of model trees with regression and splitting nodes , author=. 2004 , publisher=

2004
[35]

Advances in Neural Information Processing Systems , pages =

Optimal Sparse Decision Trees , author =. Advances in Neural Information Processing Systems , pages =
[36]

arXiv preprint arXiv:2409.12788 , year =

Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance , author =. arXiv preprint arXiv:2409.12788 , year =

arXiv
[37]

1984 , publisher=

Classification and Regression Trees , author=. 1984 , publisher=

1984
[38]

5: Programs for Machine Learning , author=

C4. 5: Programs for Machine Learning , author=. 1993 , publisher=

1993
[39]

Advances in Neural Information Processing Systems , volume=

Harnessing the power of choices in decision tree learning , author=. Advances in Neural Information Processing Systems , volume=
[40]

Breiman meets

Kohler, Hector and Akrour, Riad and Preux, Philippe , booktitle=. Breiman meets. 2025 , doi =

2025
[41]

Mazumder, Rahul and Meng, Xiang and Wang, Haoyue , booktitle =. Quant-. 2022 , volume =

2022
[42]

Proceedings of the

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound , author=. Proceedings of the
[43]

Proceedings of the European Conference on Machine Learning , pages=

Inducing model trees for continuous classes , author=. Proceedings of the European Conference on Machine Learning , pages=
[44]

Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS 2020) , series =

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations , author =. Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS 2020) , series =. 2020 , doi =

2020
[45]

Dua, Dheeru and Graff, Casey , year =
[46]

2018 , howpublished =

Choi, Miri , title =. 2018 , howpublished =

2018
[47]

Proceedings of the

Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the. 2019 , doi =

2019
[48]

2017 , howpublished =

California Housing Prices , author =. 2017 , howpublished =

2017
[49]

2021 , howpublished =

Walmart Dataset , author =. 2021 , howpublished =

2021
[50]

Decision trees: from efficient prediction to responsible

Blockeel, Hendrik and Devos, Laurens and Fr. Decision trees: from efficient prediction to responsible. Frontiers in Artificial Intelligence , volume=. 2023 , publisher=

2023
[51]

Journal of Heuristics , volume=

Rollout algorithms for combinatorial optimization , author=. Journal of Heuristics , volume=. 1997 , publisher=

1997
[52]

Annals of Operations Research , volume=

Looking ahead with the pilot method , author=. Annals of Operations Research , volume=. 2005 , publisher=

2005
[53]

2002 , edition =

Accuracy and Stability of Numerical Algorithms , author =. 2002 , edition =

2002

[1] [1]

Annals of Mathematical Statistics , volume =

Sherman, Jack and Morrison, Winifred J , title =. Annals of Mathematical Statistics , volume =. 1950 , doi =

1950

[2] [2]

Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4--8, 2004 , series =

Potts, Duncan , title =. Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4--8, 2004 , series =. 2004 , doi =

2004

[3] [3]

Note sur une m

Benoit, Commandant , journal=. Note sur une m. 1924 , doi =

1924

[4] [4]

The Quarterly Journal of Mechanics and Applied Mathematics , volume=

Rounding-off errors in matrix processes , author=. The Quarterly Journal of Mechanics and Applied Mathematics , volume=. 1948 , publisher=

1948

[5] [5]

and Teukolsky, Saul A

Press, William H. and Teukolsky, Saul A. and Vetterling, William T. and Flannery, Brian P. , publisher =. Numerical Recipes in

[6] [6]

Low Rank Updates for the

Seeger, Matthias , institution =. Low Rank Updates for the. 2008 , url =

2008

[7] [7]

Near-Optimal Decision Trees in a

Babbar, Varun and McTavish, Hayden and Rudin, Cynthia and Seltzer, Margo , booktitle =. Near-Optimal Decision Trees in a. 2025 , publisher =

2025

[8] [8]

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (UAI 2024) , pages =

Learning Accurate and Interpretable Decision Trees , author =. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (UAI 2024) , pages =. 2024 , volume =

2024

[9] [9]

Proceedings of the

Learning optimal classification trees using a binary linear program formulation , author=. Proceedings of the

[10] [10]

arXiv preprint arXiv:2508.06064 , year =

A Generic Complete Anytime Beam Search for Optimal Decision Tree , author =. arXiv preprint arXiv:2508.06064 , year =

arXiv

[11] [11]

Branches: Efficiently Seeking Optimal Sparse Decision Trees via

Chaouki, Ayman and Read, Jesse and Bifet, Albert , booktitle =. Branches: Efficiently Seeking Optimal Sparse Decision Trees via. 2025 , volume =

2025

[12] [12]

Exploring the whole

Xin, Rui and Zhong, Chudi and Chen, Zhi and Takagi, Takuya and Seltzer, Margo and Rudin, Cynthia , journal=. Exploring the whole

[13] [13]

Efficient Lookahead Decision Trees , author =. Advances in Intelligent Data Analysis XXII: 22nd International Symposium on Intelligent Data Analysis, IDA 2024, Stockholm, Sweden, April 24--26, 2024, Proceedings, Part II , series =. 2024 , publisher =

2024

[14] [14]

Proceedings of the 40th International Conference on Machine Learning (ICML 2023) , pages =

Blossom: an Anytime Algorithm for Computing Optimal Decision Trees , author =. Proceedings of the 40th International Conference on Machine Learning (ICML 2023) , pages =. 2023 , volume =

2023

[15] [15]

Proceedings of the

Maptree: Beating ``optimal'' decision trees with bayesian decision trees , author=. Proceedings of the

[16] [16]

Proceedings of the 37th International Conference on Machine Learning (ICML 2020) , pages =

Generalized and Scalable Optimal Sparse Decision Trees , author =. Proceedings of the 37th International Conference on Machine Learning (ICML 2020) , pages =. 2020 , volume =

2020

[17] [17]

Proceedings of the

Fast sparse decision tree optimization via reference ensembles , author=. Proceedings of the

[18] [18]

Mathematics of Computation , volume=

Methods for modifying matrix factorizations , author=. Mathematics of Computation , volume=. 1974 , doi =

1974

[19] [19]

Nature Machine Intelligence , volume=

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , author=. Nature Machine Intelligence , volume=. 2019 , doi =

2019

[20] [20]

Statistics Surveys , volume=

Interpretable machine learning: Fundamental principles and 10 grand challenges , author=. Statistics Surveys , volume=. 2022 , doi =

2022

[21] [21]

Journal of the American Statistical Association , volume=

Problems in the analysis of survey data, and a proposal , author=. Journal of the American Statistical Association , volume=. 1963 , publisher=

1963

[22] [22]

1977 , doi =

An Algorithm for Constructing Optimal Binary Decision Trees , author =. 1977 , doi =

1977

[23] [23]

Australian Joint Conference on Artificial Intelligence , volume=

Learning with continuous classes , author=. Australian Joint Conference on Artificial Intelligence , volume=. 1992 , organization=

1992

[24] [24]

Statistica Sinica , volume =

Regression trees with unbiased variable selection and interaction detection , author=. Statistica Sinica , volume =

[25] [25]

Journal of Machine Learning Research , volume=

Murtree: Optimal decision trees via dynamic programming and search , author=. Journal of Machine Learning Research , volume=

[26] [26]

Fast linear model trees by

Raymaekers, Jakob and Rousseeuw, Peter J and Verdonck, Tim and Yao, Ruicong , journal=. Fast linear model trees by. 2024 , publisher=

2024

[27] [27]

International Statistical Review , volume=

Fifty years of classification and regression trees , author=. International Statistical Review , volume=. 2014 , publisher=

2014

[28] [28]

Artificial Intelligence Review , volume=

Recent advances in decision trees: an updated survey , author=. Artificial Intelligence Review , volume=. 2023 , publisher=

2023

[29] [29]

Machine Learning , volume=

Optimal classification trees , author=. Machine Learning , volume=. 2017 , publisher=

2017

[30] [30]

2013 , publisher=

Matrix Computations , author=. 2013 , publisher=

2013

[31] [31]

Proceedings of the

Learning optimal decision trees using caching branch-and-bound search , author=. Proceedings of the

[32] [32]

Proceedings of the 41st International Conference on Machine Learning (ICML 2024) , pages =

Piecewise Constant and Linear Regression Trees: An Optimal Dynamic Programming Approach , author =. Proceedings of the 41st International Conference on Machine Learning (ICML 2024) , pages =. 2024 , volume =

2024

[33] [33]

Proceedings of the

Optimal sparse regression trees , author=. Proceedings of the

[34] [34]

2004 , publisher=

Top-down induction of model trees with regression and splitting nodes , author=. 2004 , publisher=

2004

[35] [35]

Advances in Neural Information Processing Systems , pages =

Optimal Sparse Decision Trees , author =. Advances in Neural Information Processing Systems , pages =

[36] [36]

arXiv preprint arXiv:2409.12788 , year =

Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance , author =. arXiv preprint arXiv:2409.12788 , year =

arXiv

[37] [37]

1984 , publisher=

Classification and Regression Trees , author=. 1984 , publisher=

1984

[38] [38]

5: Programs for Machine Learning , author=

C4. 5: Programs for Machine Learning , author=. 1993 , publisher=

1993

[39] [39]

Advances in Neural Information Processing Systems , volume=

Harnessing the power of choices in decision tree learning , author=. Advances in Neural Information Processing Systems , volume=

[40] [40]

Breiman meets

Kohler, Hector and Akrour, Riad and Preux, Philippe , booktitle=. Breiman meets. 2025 , doi =

2025

[41] [41]

Mazumder, Rahul and Meng, Xiang and Wang, Haoyue , booktitle =. Quant-. 2022 , volume =

2022

[42] [42]

Proceedings of the

Optimal Classification Trees for Continuous Feature Data Using Dynamic Programming with Branch-and-Bound , author=. Proceedings of the

[43] [43]

Proceedings of the European Conference on Machine Learning , pages=

Inducing model trees for continuous classes , author=. Proceedings of the European Conference on Machine Learning , pages=

[44] [44]

Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS 2020) , series =

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations , author =. Proceedings of the 11th Innovations in Theoretical Computer Science Conference (ITCS 2020) , series =. 2020 , doi =

2020

[45] [45]

Dua, Dheeru and Graff, Casey , year =

[46] [46]

2018 , howpublished =

Choi, Miri , title =. 2018 , howpublished =

2018

[47] [47]

Proceedings of the

Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the. 2019 , doi =

2019

[48] [48]

2017 , howpublished =

California Housing Prices , author =. 2017 , howpublished =

2017

[49] [49]

2021 , howpublished =

Walmart Dataset , author =. 2021 , howpublished =

2021

[50] [50]

Decision trees: from efficient prediction to responsible

Blockeel, Hendrik and Devos, Laurens and Fr. Decision trees: from efficient prediction to responsible. Frontiers in Artificial Intelligence , volume=. 2023 , publisher=

2023

[51] [51]

Journal of Heuristics , volume=

Rollout algorithms for combinatorial optimization , author=. Journal of Heuristics , volume=. 1997 , publisher=

1997

[52] [52]

Annals of Operations Research , volume=

Looking ahead with the pilot method , author=. Annals of Operations Research , volume=. 2005 , publisher=

2005

[53] [53]

2002 , edition =

Accuracy and Stability of Numerical Algorithms , author =. 2002 , edition =

2002