arxiv: 2605.12983 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.CC

Recognition: 2 theorem links

· Lean Theorem

Decision Tree Learning on Product Spaces

Arshia Soltani Moakahr , Faraz Ghahremani , Kiarash Banihashem , MohammadTaghi Hajiaghayi

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:06 UTC · model grok-4.3

classification 💻 cs.LG cs.CC

keywords decision treesgreedy heuristicproduct distributionssize boundsapproximation algorithmslearning theoryparameter-free algorithms

0 comments

The pith

For functions exactly computable by an optimal decision tree, the greedy heuristic produces an ε-approximating tree whose size is bounded by exp(Δ_opt D_opt log(e/ε)) under any product distribution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proves that the standard top-down greedy method for building decision trees works with explicit size guarantees when the input features are independent. If a target function is exactly represented by some optimal tree of size s with maximum depth D_opt and average depth Δ_opt, then greedy finds a tree within ε error whose size grows at most exponentially in the product Δ_opt D_opt log(e/ε). The bound applies to arbitrary product distributions rather than only the uniform case, and a version of the algorithm is given that needs no advance knowledge of s or the depths. These results make the theoretical analysis of a widely used practical method applicable to more realistic data settings.

Core claim

For any function f computable by an optimal decision tree of size s, maximum depth D_opt, and average depth Δ_opt, the greedy heuristic constructs an ε-approximating tree whose size grows at most with exp(Δ_opt D_opt log(e/ε)). In the special case where the optimal tree is a full binary tree, this bound improves upon the bound of prior work and holds under a strictly broader class of distributions. The paper also presents an algorithm based on the top-down greedy heuristic that is entirely parameter-free.

What carries the argument

The top-down greedy heuristic, which recursively splits on the variable of highest influence under the given product distribution.

If this is right

The size of the greedy tree is bounded exponentially in the product of the optimal tree's average depth and maximum depth.
The same exponential bound holds for every product distribution, not only the uniform distribution.
For full binary optimal trees the new bound is strictly smaller than the previous uniform-distribution bound.
A parameter-free variant of the greedy algorithm exists that achieves the same guarantee without knowing the optimal size or depths in advance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bound gives a concrete rule of thumb for how large a greedy tree must be grown to reach a target accuracy when features are independent.
The result suggests that greedy trees may remain efficient on real data sets whose features are only approximately independent.
The analysis could be tested by measuring how quickly the greedy size grows on synthetic product distributions with known optimal trees.

Load-bearing premise

The target function must be exactly computable by some optimal decision tree, and the underlying distribution must be a product distribution.

What would settle it

A concrete counterexample would be a specific function computable by a small optimal decision tree under some product distribution for which every greedy tree achieving ε error has size strictly larger than exp(Δ_opt D_opt log(e/ε)).

Figures

Figures reproduced from arXiv: 2605.12983 by Arshia Soltani Moakahr, Faraz Ghahremani, Kiarash Banihashem, MohammadTaghi Hajiaghayi.

**Figure 1.** Figure 1: A visual depiction of one iteration of the practical top-down heuristic Algorithm 2. Step 1: The algorithm identifies the leaf and the variable that provide the best split based on estimated scores. Step 2: The samples stored at the selected leaf are partitioned between the two new children based on the splitting rule. Step 3: New random samples are drawn and added to the sample sets of all leaves in the t… view at source ↗

**Figure 2.** Figure 2: Empirical evaluation of the practical, parameter-free top-down heuristic (Algorithm 2) on a balanced DT (left panel of each subfigure) and an unbalanced DT (right panel of each subfigure). Curves correspond to product distributions with bias p ∈ {0.5, 0.3, 0.1}; error bars show ±1 standard deviation across the 6 repetitions. E. Worked Examples: Balanced and Path-Like Trees In this appendix we work through … view at source ↗

read the original abstract

Decision tree learning has long been a central topic in theoretical computer science, driven by its practical importance. A fundamental and widely used method for decision tree construction is the top-down greedy heuristic, which recursively splits on the most influential variable. Despite its empirical success, theoretical analysis of this heuristic has been limited. A recent breakthrough by Blanc et al. (ITCS, 2020) provided the first rigorous theoretical guarantees for the greedy approach, but only under the uniform distribution. We extend this analysis to the more general and practically relevant setting of arbitrary product distributions. Our main result shows that for any function $f$ computable by an optimal decision tree of size $s$, maximum depth $D_{\text{opt}}$, and average depth $\Delta_{\text{opt}}$, the greedy heuristic constructs an $\epsilon$-approximating tree whose size grows at most with $\exp\bigl(\Delta_{\text{opt}} D_{\text{opt}} \log(e/\epsilon)\bigr)$. In the special case where the optimal tree is a full binary tree, this bound improves upon the bound of Blanc et al. and holds under a strictly broader class of distributions. Moreover, we present an algorithm based on the top-down greedy heuristic that is entirely parameter-free -- it requires no prior knowledge of the optimal tree's size or depth -- offering a practical advantage over Blanc et al.'s method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends the Blanc et al. greedy DT analysis from uniform to arbitrary product distributions, with a cleaner bound for full binary trees and a parameter-free algorithm variant.

read the letter

The core advance is moving the rigorous size guarantee for top-down greedy decision trees from uniform distributions to any product distribution. The new bound says that if the target is exactly represented by an optimal tree with average depth Δ_opt and max depth D_opt, then greedy produces an ε-approximator whose size is at most exp(Δ_opt D_opt log(e/ε)). In the full-binary-tree case this improves the earlier uniform bound, and the paper also gives a version of the algorithm that needs no prior knowledge of s, D_opt, or Δ_opt. That parameter-free piece is the most immediately usable addition. The product-distribution setting is a modest but real step toward distributions that appear in applications, since independence is weaker than uniformity. The abstract states the assumptions cleanly: the target must be exactly representable by some optimal tree, and features must be independent under the measure. No hidden circularity is visible in the stated claim. The main limitation is that the bound is still expressed in terms of the unknown optimal depths, so it remains an existence result rather than a fully a-priori guarantee; the parameter-free algorithm mitigates this for implementation but not for the analysis. Without the full proof it is impossible to check how the product-measure terms are handled or whether any approximation steps accumulate error in a way that weakens the final exponent. Still, the extension is stated directly and the prior work is cited without self-referential fitting. This is the kind of incremental but technically grounded paper that learning-theory groups should see. It is worth sending to referees who work on analysis of heuristics; they can verify the derivation and decide whether the improvement over Blanc et al. is worth the added generality.

Referee Report

0 major / 2 minor

Summary. The manuscript extends the analysis of the top-down greedy heuristic for decision tree learning from uniform distributions (Blanc et al., ITCS 2020) to arbitrary product distributions. For any function exactly representable by an optimal decision tree of size s with maximum depth D_opt and average depth Δ_opt, the greedy method produces an ε-approximating tree whose size is at most exp(Δ_opt D_opt log(e/ε)). A parameter-free variant of the algorithm is presented that requires no prior knowledge of s, D_opt, or Δ_opt.

Significance. If the central bound holds, the work supplies the first rigorous size guarantees for greedy decision tree construction under product distributions, a setting more relevant to practice than the uniform case. The explicit dependence on optimal-tree parameters and the parameter-free algorithm are concrete strengths that could support both theoretical extensions and algorithmic implementations.

minor comments (2)

[Abstract] Abstract: the size bound is stated without explicit dependence on s; clarify whether the constructed-tree size is independent of the optimal tree size or if s enters the analysis only through the existence assumption.
[Abstract] Abstract: the claimed improvement over Blanc et al. in the full-binary-tree case is stated qualitatively; a direct comparison of the two bounds (including the distribution class) would strengthen the presentation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. The manuscript provides the first rigorous size guarantees for the top-down greedy heuristic under arbitrary product distributions, extending the uniform-distribution analysis of Blanc et al. (ITCS 2020). The central bound depends explicitly on the optimal tree parameters D_opt and Δ_opt, and we also present a fully parameter-free algorithm. Since the report contains no specific major comments, we have no individual points to address.

Circularity Check

1 steps flagged

Minor self-citation to prior work; central bound derived independently for product distributions

specific steps

self citation load bearing [Abstract]
"A recent breakthrough by Blanc et al. (ITCS, 2020) provided the first rigorous theoretical guarantees for the greedy approach, but only under the uniform distribution. We extend this analysis to the more general and practically relevant setting of arbitrary product distributions."

The central result is framed as an extension of the cited prior analysis; while the extension itself introduces new content via the product-distribution setting, the load-bearing guarantee still rests on the correctness of the referenced Blanc et al. theorem rather than a fully self-contained re-derivation.

full rationale

The derivation extends Blanc et al. (2020) by replacing the uniform distribution with an arbitrary product measure while keeping the same greedy algorithm and expressing the size bound solely in terms of the optimal tree's Δ_opt and D_opt. No equation reduces the claimed bound to a fitted parameter defined from the target result itself, and the new distribution class supplies independent grounding. The self-citation is present but not load-bearing for the novel claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the distribution factors as a product over independent coordinates and on standard properties of decision tree size and depth measures; no free parameters or new invented entities are introduced.

axioms (1)

domain assumption The input distribution is a product distribution over the feature space.
Explicitly stated as the generalization beyond uniform; invoked to extend the analysis.

pith-pipeline@v0.9.0 · 5564 in / 1213 out tokens · 57326 ms · 2026-05-14T20:06:14.894522+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our main result shows that for any function f computable by an optimal decision tree of size s, maximum depth D_opt, and average depth Δ_opt, the greedy heuristic constructs an ε-approximating tree whose size grows at most with exp(Δ_opt D_opt log(e/ε)).
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 4.2 (Total Influence vs. Variance and Depth) ... Inf(f) ≤ Δ(T)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations , booktitle =

Guy Blanc and Jane Lange and Li. Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations , booktitle =. 2020 , url =. doi:10.4230/LIPICS.ITCS.2020.44 , timestamp =

work page doi:10.4230/lipics.itcs.2020.44 2020
[2]

and Saks, M

O'Donnell, R. and Saks, M. and Schramm, O. and Servedio, R.A. , booktitle=. Every decision tree has an influential variable , year=

work page
[3]

Theoretical Computer Science , volume=

Decision tree approximations of Boolean functions , author=. Theoretical Computer Science , volume=. 2002 , publisher=

work page 2002
[4]

Information and Computation , volume=

Learning decision trees from random examples , author=. Information and Computation , volume=. 1989 , publisher=

work page 1989
[5]

Proceedings of the fortieth annual ACM symposium on Theory of computing , pages=

Agnostically learning decision trees , author=. Proceedings of the fortieth annual ACM symposium on Theory of computing , pages=

work page
[6]

Journal of the ACM , volume=

Properly learning decision trees in almost polynomial time , author=. Journal of the ACM , volume=. 2022 , publisher=

work page 2022
[7]

Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

Superpolynomial lower bounds for decision tree learning and testing , author=. Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2023 , organization=

work page 2023
[8]

Conference on Learning Theory , pages=

Id3 learns juntas for smoothed product distributions , author=. Conference on Learning Theory , pages=. 2020 , organization=

work page 2020
[9]

Decision trees are PAC-learnable from most product distributions: a smoothed analysis

Decision trees are PAC-learnable from most product distributions: a smoothed analysis , author=. arXiv preprint arXiv:0812.0933 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Conference on Learning Theory , pages=

Open Problem: Properly learning decision trees in polynomial time? , author=. Conference on Learning Theory , pages=. 2022 , organization=

work page 2022
[11]

The Thirty Seventh Annual Conference on Learning Theory , pages=

Superconstant inapproximability of decision tree learning , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=

work page 2024
[12]

SIAM Journal on Computing , volume=

Learning monotone decision trees in polynomial time , author=. SIAM Journal on Computing , volume=. 2007 , publisher=

work page 2007
[13]

27th Annual Symposium on Foundations of Computer Science (sfcs 1986) , pages=

Probabilistic Boolean decision trees and the complexity of evaluating game trees , author=. 27th Annual Symposium on Foundations of Computer Science (sfcs 1986) , pages=. 1986 , organization=

work page 1986
[14]

Proceedings of the twenty-third annual ACM symposium on Theory of computing , pages=

Learning decision trees using the Fourier spectrum , author=. Proceedings of the twenty-third annual ACM symposium on Theory of computing , pages=

work page
[15]

Machine learning , volume=

Induction of decision trees , author=. Machine learning , volume=. 1986 , publisher=

work page 1986
[16]

5: programs for machine learning , author=

C4. 5: programs for machine learning , author=. 2014 , publisher=

work page 2014
[17]

2017 , publisher=

Classification and regression trees , author=. 2017 , publisher=

work page 2017
[18]

Journal of the American Statistical Association , volume=

Large scale prediction with decision trees , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024
[19]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Parameterized complexity of small decision tree learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[20]

The Fourteenth International Conference on Learning Representations , year=

Active Learning for Decision Trees with Provable Guarantees , author=. The Fourteenth International Conference on Learning Representations , year=

work page