PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
Pith reviewed 2026-05-16 11:14 UTC · model grok-4.3
The pith
PolySHAP shows that the paired sampling heuristic in KernelSHAP produces exactly the same Shapley approximations as fitting a second-degree polynomial.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By replacing the linear regression step in KernelSHAP with a polynomial regression of chosen degree, the method obtains Shapley value estimates that account for feature interactions and converge to the true values. The central technical result is that antithetic paired sampling, which evaluates complementary feature subsets together, yields coefficient estimates identical to those obtained by explicitly solving the second-order polynomial least-squares problem, without ever constructing or inverting the associated design matrix.
What carries the argument
The equivalence between antithetic paired sampling and the closed-form solution of the degree-2 polynomial regression problem inside the KernelSHAP sampling distribution.
If this is right
- Shapley estimates improve because quadratic terms capture pairwise feature interactions that linear KernelSHAP ignores.
- The paired sampling procedure can be used directly without solving any polynomial system.
- Higher-degree versions of PolySHAP remain consistent estimators.
- Empirical accuracy gains appear across multiple tabular datasets and model types.
Where Pith is reading between the lines
- The same equivalence argument might extend to higher even degrees if an appropriate paired or grouped sampling scheme can be defined.
- PolySHAP could be combined with other variance-reduction techniques such as control variates without losing the sampling-to-polynomial identity.
- If the value function deviates strongly from low-degree polynomials, both paired sampling and PolySHAP will degrade at the same rate.
Load-bearing premise
The model's value function admits a useful low-degree polynomial approximation when features are included or excluded according to the uniform random subset distribution used by KernelSHAP.
What would settle it
Run both the paired sampling procedure and an explicit least-squares fit of a degree-2 polynomial on the same set of feature-subset evaluations and check whether the resulting Shapley value vectors are numerically identical for every test model.
read the original abstract
Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee's KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating the game as a linear function, which is fit using a small number of game evaluations for random feature subsets. In this work, we extend KernelSHAP by approximating the game via higher degree polynomials, which capture non-linear interactions between features. Our resulting PolySHAP method yields empirically better Shapley value estimates for various benchmark datasets, and we prove that these estimates are consistent. Moreover, we connect our approach to paired sampling (antithetic sampling), a ubiquitous modification to KernelSHAP that improves empirical accuracy. We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial. To the best of our knowledge, this finding provides the first strong theoretical justification for the excellent practical performance of the paired sampling heuristic.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PolySHAP as an extension of KernelSHAP that approximates the cooperative game value function using higher-degree polynomials rather than linear functions, thereby capturing feature interactions. It reports empirical improvements in Shapley value estimates on benchmark datasets, proves consistency of the resulting estimates, and establishes an algebraic equivalence showing that paired (antithetic) sampling produces identical approximations to those obtained from explicitly fitting a degree-2 polynomial under the KernelSHAP sampling distribution, without performing the fit.
Significance. If the equivalence holds, the work supplies the first explicit theoretical justification for the strong empirical performance of the paired-sampling heuristic commonly used with KernelSHAP. The polynomial formulation also offers a principled route to incorporating interactions while preserving the sampling-based estimation framework, which could improve explanation fidelity in settings where low-order feature interactions are present.
major comments (1)
- The consistency proof for PolySHAP (mentioned in the abstract) must be checked against the precise sampling distribution and kernel weights inherited from KernelSHAP; any unstated regularity conditions on the value function or on the growth of the number of samples would affect whether the result is load-bearing for the claimed improvements.
minor comments (2)
- Abstract: the statement that PolySHAP 'yields empirically better' estimates should be accompanied by the specific datasets, metrics, and quantitative deltas reported in the experimental section.
- The description of the polynomial basis and the fitting procedure should explicitly state how the design matrix is constructed from the same feature-subset samples used by KernelSHAP, to make the connection to paired sampling fully transparent.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comment regarding the consistency proof. We address the point below and will revise the manuscript to strengthen the presentation of the result.
read point-by-point responses
-
Referee: The consistency proof for PolySHAP (mentioned in the abstract) must be checked against the precise sampling distribution and kernel weights inherited from KernelSHAP; any unstated regularity conditions on the value function or on the growth of the number of samples would affect whether the result is load-bearing for the claimed improvements.
Authors: We agree that the consistency proof requires explicit alignment with the KernelSHAP sampling distribution and kernel weights. In the revised manuscript we will expand the relevant theorem and proof to (i) directly invoke the exact sampling distribution and weighted least-squares objective from Lundberg and Lee (2017), (ii) state the regularity conditions (bounded second moments of the value function and n → ∞ with appropriate rate), and (iii) clarify that consistency holds in probability under these conditions. The core algebraic steps remain unchanged; the revision only makes the assumptions and sampling reference transparent. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives a mathematical equivalence showing that paired sampling produces identical Shapley approximations to explicit second-order polynomial fitting under the KernelSHAP distribution. This is presented as an algebraic identity proven from the paper's own polynomial formulation and sampling scheme, without reducing to any fitted parameter or prior result by construction. Consistency proofs for the PolySHAP estimator and empirical improvements are independent of this equivalence. No self-citations are load-bearing for the central claims, and the derivation does not rename known results or smuggle ansatzes. The argument remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The cooperative game value function admits a low-degree polynomial approximation under random feature subset sampling.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial (Theorem 5.1).
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PolySHAP representation ϕI[ν] := arg min ... interaction frontier I ⊆ {T⊆D : |T|≥2}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
Shapley values in product games equal an exact one-dimensional integral of a polynomial, computable via Gauss-Legendre quadrature with linear cost in the number of features.
-
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
Shapley values in product games equal the integral of a degree-(d-1) polynomial over [0,1], allowing provably exact or near-exact computation via Gauss-Legendre quadrature with O(d m_q) work.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.