PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression

Christopher Musco; Fabian Fumagalli; R. Teal Witter

arxiv: 2601.18608 · v3 · pith:TRYC5IXEnew · submitted 2026-01-26 · 💻 cs.AI · cs.LG

PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression

Fabian Fumagalli , R. Teal Witter , Christopher Musco This is my paper

Pith reviewed 2026-05-16 11:14 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords Shapley valuesKernelSHAPpolynomial regressionpaired samplingexplainable AIfeature interactionsconsistency

0 comments

The pith

PolySHAP shows that the paired sampling heuristic in KernelSHAP produces exactly the same Shapley approximations as fitting a second-degree polynomial.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

KernelSHAP approximates Shapley values by fitting a linear model to the value function evaluated on random feature subsets. PolySHAP extends this by fitting higher-degree polynomials that capture interactions among features. The resulting estimates are more accurate on benchmark datasets and remain consistent as the number of samples grows. The paper proves that the common paired sampling modification to KernelSHAP is mathematically identical to second-order PolySHAP, so the heuristic already performs the quadratic regression implicitly. This equivalence supplies the first rigorous account of why paired sampling improves accuracy without extra computation.

Core claim

By replacing the linear regression step in KernelSHAP with a polynomial regression of chosen degree, the method obtains Shapley value estimates that account for feature interactions and converge to the true values. The central technical result is that antithetic paired sampling, which evaluates complementary feature subsets together, yields coefficient estimates identical to those obtained by explicitly solving the second-order polynomial least-squares problem, without ever constructing or inverting the associated design matrix.

What carries the argument

The equivalence between antithetic paired sampling and the closed-form solution of the degree-2 polynomial regression problem inside the KernelSHAP sampling distribution.

If this is right

Shapley estimates improve because quadratic terms capture pairwise feature interactions that linear KernelSHAP ignores.
The paired sampling procedure can be used directly without solving any polynomial system.
Higher-degree versions of PolySHAP remain consistent estimators.
Empirical accuracy gains appear across multiple tabular datasets and model types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same equivalence argument might extend to higher even degrees if an appropriate paired or grouped sampling scheme can be defined.
PolySHAP could be combined with other variance-reduction techniques such as control variates without losing the sampling-to-polynomial identity.
If the value function deviates strongly from low-degree polynomials, both paired sampling and PolySHAP will degrade at the same rate.

Load-bearing premise

The model's value function admits a useful low-degree polynomial approximation when features are included or excluded according to the uniform random subset distribution used by KernelSHAP.

What would settle it

Run both the paired sampling procedure and an explicit least-squares fit of a degree-2 polynomial on the same set of feature-subset evaluations and check whether the resulting Shapley value vectors are numerically identical for every test model.

read the original abstract

Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee's KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating the game as a linear function, which is fit using a small number of game evaluations for random feature subsets. In this work, we extend KernelSHAP by approximating the game via higher degree polynomials, which capture non-linear interactions between features. Our resulting PolySHAP method yields empirically better Shapley value estimates for various benchmark datasets, and we prove that these estimates are consistent. Moreover, we connect our approach to paired sampling (antithetic sampling), a ubiquitous modification to KernelSHAP that improves empirical accuracy. We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial. To the best of our knowledge, this finding provides the first strong theoretical justification for the excellent practical performance of the paired sampling heuristic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main advance is proving that paired sampling gives exactly the same Shapley approximations as second-order PolySHAP without fitting any polynomial.

read the letter

The main takeaway is that this paper proves paired sampling produces the exact same Shapley approximations as fitting a second-order polynomial in their PolySHAP framework, without actually doing the fit. That's a solid theoretical result. They extend the standard KernelSHAP linear approximation to higher-degree polynomials to account for interactions between features. The method shows better performance on benchmark datasets and comes with a consistency proof. The connection to paired sampling is the strongest part. It explains why that simple heuristic works well in practice, and the proof is presented as new. Since the equivalence is algebraic, it doesn't rely on the quality of the polynomial fit to the underlying game. One potential soft spot is whether the low-degree polynomial approximation is reliable for the kinds of models people actually use in XAI. The paper assumes it admits a useful approximation under the sampling distribution, but for highly nonlinear models this might not hold strongly. The empirical results would need to be examined closely to see the magnitude of improvement. Overall, this is useful for people building or using explanation methods based on Shapley values. It adds both an algorithmic option and some theory to a common trick. I would bring it to a reading group and think it deserves peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes PolySHAP as an extension of KernelSHAP that approximates the cooperative game value function using higher-degree polynomials rather than linear functions, thereby capturing feature interactions. It reports empirical improvements in Shapley value estimates on benchmark datasets, proves consistency of the resulting estimates, and establishes an algebraic equivalence showing that paired (antithetic) sampling produces identical approximations to those obtained from explicitly fitting a degree-2 polynomial under the KernelSHAP sampling distribution, without performing the fit.

Significance. If the equivalence holds, the work supplies the first explicit theoretical justification for the strong empirical performance of the paired-sampling heuristic commonly used with KernelSHAP. The polynomial formulation also offers a principled route to incorporating interactions while preserving the sampling-based estimation framework, which could improve explanation fidelity in settings where low-order feature interactions are present.

major comments (1)

The consistency proof for PolySHAP (mentioned in the abstract) must be checked against the precise sampling distribution and kernel weights inherited from KernelSHAP; any unstated regularity conditions on the value function or on the growth of the number of samples would affect whether the result is load-bearing for the claimed improvements.

minor comments (2)

Abstract: the statement that PolySHAP 'yields empirically better' estimates should be accompanied by the specific datasets, metrics, and quantitative deltas reported in the experimental section.
The description of the polynomial basis and the fitting procedure should explicitly state how the design matrix is constructed from the same feature-subset samples used by KernelSHAP, to make the connection to paired sampling fully transparent.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive comment regarding the consistency proof. We address the point below and will revise the manuscript to strengthen the presentation of the result.

read point-by-point responses

Referee: The consistency proof for PolySHAP (mentioned in the abstract) must be checked against the precise sampling distribution and kernel weights inherited from KernelSHAP; any unstated regularity conditions on the value function or on the growth of the number of samples would affect whether the result is load-bearing for the claimed improvements.

Authors: We agree that the consistency proof requires explicit alignment with the KernelSHAP sampling distribution and kernel weights. In the revised manuscript we will expand the relevant theorem and proof to (i) directly invoke the exact sampling distribution and weighted least-squares objective from Lundberg and Lee (2017), (ii) state the regularity conditions (bounded second moments of the value function and n → ∞ with appropriate rate), and (iii) clarify that consistency holds in probability under these conditions. The core algebraic steps remain unchanged; the revision only makes the assumptions and sampling reference transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives a mathematical equivalence showing that paired sampling produces identical Shapley approximations to explicit second-order polynomial fitting under the KernelSHAP distribution. This is presented as an algebraic identity proven from the paper's own polynomial formulation and sampling scheme, without reducing to any fitted parameter or prior result by construction. Consistency proofs for the PolySHAP estimator and empirical improvements are independent of this equivalence. No self-citations are load-bearing for the central claims, and the derivation does not rename known results or smuggle ansatzes. The argument remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the modeling assumption that the value function of the cooperative game can be usefully approximated by a low-degree polynomial and on standard sampling-based estimation procedures from the KernelSHAP literature.

axioms (1)

domain assumption The cooperative game value function admits a low-degree polynomial approximation under random feature subset sampling.
This replaces the linear assumption of KernelSHAP and is required for both the empirical improvement and the paired-sampling equivalence.

pith-pipeline@v0.9.0 · 5509 in / 1220 out tokens · 32975 ms · 2026-05-16T11:14:21.184479+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial (Theorem 5.1).
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PolySHAP representation ϕI[ν] := arg min ... interaction frontier I ⊆ {T⊆D : |T|≥2}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
cs.LG 2026-05 unverdicted novelty 7.0

Shapley values in product games equal an exact one-dimensional integral of a polynomial, computable via Gauss-Legendre quadrature with linear cost in the number of features.
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
cs.LG 2026-05 conditional novelty 7.0

Shapley values in product games equal the integral of a degree-(d-1) polynomial over [0,1], allowing provably exact or near-exact computation via Gauss-Legendre quadrature with O(d m_q) work.