pith. sign in

arxiv: 2511.09376 · v2 · submitted 2025-11-12 · 💻 cs.LG

From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

Pith reviewed 2026-05-17 23:18 UTC · model grok-4.3

classification 💻 cs.LG
keywords SHAPdecision tree ensemblespseudo-Boolean logicmodel interpretabilitygame-theoretic explanationslinear-time algorithmsBackground SHAPBoolean formulas
0
0 comments X

The pith

WOODELF computes Background SHAP for tree ensembles in linear time by building one pseudo-Boolean formula per sample that encodes the trees, features, and full background dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents WOODELF as a method that integrates decision trees, game theory, and Boolean logic to compute SHAP values. For each data point it constructs a pseudo-Boolean formula that simultaneously represents the ensemble structure, the point's feature values, and every row in the background dataset. Evaluating the formula then produces exact Background SHAP contributions in time linear in the combined input size. A sympathetic reader would care because Background SHAP is the more faithful variant used in regulated domains, yet prior implementations scale poorly on the million-row datasets common in practice. The same logical representation also yields Path-Dependent SHAP, Shapley interactions, Banzhaf values, and Banzhaf interactions at comparable cost.

Core claim

WOODELF constructs, for each consumer, a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset, then leverages this representation to compute Background SHAP in linear time while also supporting Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values.

What carries the argument

The pseudo-Boolean formula that encodes a sample's features together with the full decision-tree ensemble and background dataset into a single logical expression for direct evaluation of game-theoretic values.

If this is right

  • Background SHAP becomes practical on datasets with millions of background samples where previous exact methods timed out.
  • A single code path can now produce both Background and Path-Dependent SHAP as well as interaction and Banzhaf variants.
  • Standard Python array libraries suffice for both CPU and GPU execution, removing the need for custom low-level kernels.
  • Reported wall-clock times drop from tens of minutes to under 20 seconds on GPU for three-million-row problems with five-million background samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Boolean reduction may let similar linear-time techniques be applied to explanation methods for models other than trees if those models admit compact logical encodings.
  • Practitioners could shift from approximate to exact Background SHAP more often once the computational barrier is lowered.
  • The formula representation might support additional sensitivity queries beyond SHAP at marginal extra cost.

Load-bearing premise

The pseudo-Boolean formula can be built and evaluated without exponential blow-up for realistic tree depths and background sizes while exactly preserving the original game-theoretic SHAP definition.

What would settle it

Running WOODELF and an exhaustive coalition-enumeration implementation on the same small tree ensemble and background set and checking whether the numeric SHAP vectors match to machine precision.

Figures

Figures reproduced from arXiv: 2511.09376 by Alexander Nadel, Ron Wettenstein.

Figure 1
Figure 1. Figure 1: An illustration how both PB functions and decision trees relate to well-established concepts in game theory. In [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of the WDNF construction process on a small example. The consumer and baseline values are shown [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A simple decision tree illustrating the excepted [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
read the original abstract

SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine, and other domains. Two main approaches to SHAP calculation exist: Path-Dependent SHAP, which leverages the tree structure for efficiency, and Background SHAP, which uses a background dataset to estimate feature distributions. We introduce WOODELF, a SHAP algorithm that integrates decision trees, game theory, and Boolean logic into a unified framework. For each consumer, WOODELF constructs a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset. It then leverages this representation to compute Background SHAP in linear time. WOODELF can also compute Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. WOODELF is designed to run efficiently on CPU and GPU hardware alike. Available via the WOODELF Python package, it is implemented using NumPy, SciPy, and CuPy without relying on custom C++ or CUDA code. This design enables fast performance and seamless integration into existing frameworks, supporting large-scale computation of SHAP and other game-theoretic values in practice. For example, on a dataset with 3,000,000 rows, 5,000,000 background samples, and 127 features, WOODELF computed all Background Shapley values in 162 seconds on CPU and 16 seconds on GPU - compared to 44 minutes required by the best method on any hardware platform, representing 16x and 165x speedups, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces WOODELF, a unified SHAP algorithm for decision tree ensembles that constructs a per-instance pseudo-Boolean formula incorporating the instance feature values, the full tree ensemble structure, and the entire background dataset. This representation is used to compute Background SHAP values in linear time; the method also supports Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. The implementation uses only NumPy, SciPy, and CuPy (no custom C++ or CUDA) for CPU/GPU execution and reports concrete speedups, e.g., 162 s (CPU) and 16 s (GPU) versus 44 min for the best prior method on a dataset with 3 000 000 rows, 5 000 000 background samples, and 127 features.

Significance. If the pseudo-Boolean construction exactly reproduces the game-theoretic value function without hidden exponential steps or distributional approximations, the work would constitute a substantial practical advance in scalable model interpretation for tree ensembles. The reported speedups on very large data, the avoidance of low-level custom kernels, and the unification of multiple explanation types (Shapley, Banzhaf, interactions) are clear strengths that would be of immediate interest to practitioners in finance, medicine, and advertising.

major comments (1)
  1. [Abstract] Abstract: the central linear-time claim for Background SHAP rests on the assertion that a single pseudo-Boolean formula per instance 'captures ... the entire background dataset' and yields exact SHAP values. No construction rule, equivalence proof, or complexity argument is supplied showing that the Boolean encoding preserves the precise empirical conditional expectation v(S) = (1/|D|) sum_{x in D} f(x_S, x'_S^c) without introducing independence assumptions or collapsing the background distribution. This equivalence is load-bearing for both correctness and the O(n) runtime guarantee.
minor comments (2)
  1. The abstract uses the term 'consumer' without definition; the manuscript should clarify whether the algorithm is restricted to a particular application domain or applies to arbitrary decision-tree ensembles.
  2. The speed-up numbers are given for a single large instance; a table or figure reporting scaling with background size, number of trees, and feature count would strengthen the empirical section.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the major comment regarding the abstract and the supporting details for the Background SHAP claims below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central linear-time claim for Background SHAP rests on the assertion that a single pseudo-Boolean formula per instance 'captures ... the entire background dataset' and yields exact SHAP values. No construction rule, equivalence proof, or complexity argument is supplied showing that the Boolean encoding preserves the precise empirical conditional expectation v(S) = (1/|D|) sum_{x in D} f(x_S, x'_S^c) without introducing independence assumptions or collapsing the background distribution. This equivalence is load-bearing for both correctness and the O(n) runtime guarantee.

    Authors: We agree that the abstract, owing to length constraints, does not supply the explicit construction rule, equivalence proof, or complexity argument. The current manuscript describes the high-level approach but lacks a formal statement of how the single pseudo-Boolean formula is built to encode trees, instance values, and the full background dataset, as well as a proof that it exactly realizes v(S) without independence assumptions or distributional collapse. We will add these elements in the revision: a detailed construction procedure in Section 3, a theorem with proof in a new appendix establishing exact equivalence to the empirical conditional expectation, and a complexity section showing that formula evaluation yields linear time in the number of features. These changes will directly support the correctness and runtime claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents WOODELF as a direct algorithmic construction: for each instance it builds a pseudo-Boolean formula encoding the decision-tree ensemble plus the full background dataset, then evaluates Background SHAP from that representation in linear time. No equations, self-citations, or fitted parameters are shown that reduce the claimed linear-time result to a tautology or to the input data by construction. The central claim is an independent encoding step whose correctness is an external verification question, not a definitional loop. This matches the default expectation of a non-circular algorithmic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the pseudo-Boolean encoding is both faithful to the original SHAP definition and computationally tractable; these properties are asserted but not derived in the provided abstract.

axioms (1)
  • standard math Shapley values are well-defined for the cooperative game induced by a tree ensemble and background distribution
    Implicit foundation for all SHAP variants mentioned.
invented entities (1)
  • pseudo-Boolean formula per consumer no independent evidence
    purpose: Unified representation that encodes feature values, tree structure, and full background dataset for linear-time evaluation
    Core new construct introduced to achieve the claimed efficiency.

pith-pipeline@v0.9.0 · 5608 in / 1384 out tokens · 59905 ms · 2026-05-17T23:18:02.739982+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    In2024 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 1–8

    A Machine Learning Approach for Credit Card Fraud Detection in Massive Datasets Using SMOTE and Random Sampling. In2024 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 1–8. Banzhaf, J. F. 1965. Weighted V oting Doesn’t Work: A Mathematical Analysis.Rutgers Law Review, 19: 317–343. Bracci, A. 2021. background data size limited to 100, xg...

  2. [2]

    Consistent Individualized Feature Attribution for Tree Ensembles

    The Shapley Value of Tuples in Query Answering. Logical Methods in Computer Science, V olume 17, Issue 3. Lundberg, S.; and contributors, S. 2024. SHAP GPUTree Explainer Example. https://shap.readthedocs.io/en/latest/ example notebooks/api examples/explainers/GPUTree. html. Lundberg, S. M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J. M.; Nair, B.; Katz,...

  3. [3]

    In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 1–6

    A detailed analysis of the KDD CUP 99 data set. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 1–6. Xiao, Z. 2024. IEEE-CIS Fraud Detection Based on XGB. In Li, X.; Yuan, C.; and Kent, J., eds.,Proceedings of the 7th International Conference on Economic Management and Green Development, 1785–1796. Singapore: Sp...

  4. [4]

    simplified

    + 2·(1−p(f 2 <3)). Baseline SHAPcan not estimate the full probability as it only has a single baseline value forf 2.Path-Dependent SHAPestimates this probability using the cover property, which is based on the training data that reached the split node in the training process. Since only rows withX[f 1]< 4reached this node, the estimate is conditioned onX[...

  5. [5]

    The original weight of the clause was multiplied by−1

  6. [6]

    A direct corollary of Theorem 4 is the robustness of βsimplified i andϕ simplified i also w.r.t WCNF formulas

    Using de-Morgan’s laws: |S+(ck)|=|S −(¬ck)|,|S −(ck)|=|S +(¬ck)| When applying 2 and 3 toβ simplified i andϕ simplified i , both changes cancel out and we are left with the same equa- tions. A direct corollary of Theorem 4 is the robustness of βsimplified i andϕ simplified i also w.r.t WCNF formulas. Corollary 5.For any playeri, bothβ simplified i andϕ si...