Amortized Linear-time Exact Shapley Value for Product-Kernel Methods
Pith reviewed 2026-05-22 13:13 UTC · model grok-4.3
The pith
Product kernels allow exact Shapley values for all features to be computed in quadratic time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For product kernels, a distribution-free removal operator replaces each feature's kernel factor with the constant 1, which defines a unique value function for Shapley attribution. Shared recursive formulations then compute all feature attributions jointly, yielding exact values in quadratic time in the number of features with amortized linear time per feature and numerical stability.
What carries the argument
The distribution-free removal operator intrinsic to the product-kernel structure, where removing a feature replaces its kernel factor with the multiplicative identity 1.
If this is right
- Exact Shapley values become available without any approximation error for product kernel models.
- Feature attributions scale to higher dimensions because the total cost grows only quadratically with the number of features.
- The same removal construction supplies exact attributions for kernel-based discrepancies including MMD and HSIC.
- Recursive sharing of intermediate results guarantees numerical stability across all attributions.
Where Pith is reading between the lines
- The removal operator could be reused to speed up other attribution techniques whenever a model admits a multiplicative decomposition.
- Gaussian process or support-vector models that already employ product kernels could receive exact feature attributions at modest extra cost.
- Similar structure-exploiting ideas might reduce the cost of related explainability methods in settings that are not strictly kernel-based.
Load-bearing premise
The kernel must be a product kernel so that each feature can be removed simply by setting its factor to one.
What would settle it
Run PKeX-Shapley and a brute-force exact Shapley computation on a small dataset with a known product kernel and check whether the two sets of feature attributions match exactly.
read the original abstract
Kernel methods are widely used in machine learning and statistics for their flexibility and expressive power, yet their black-box nature limits adoption in high-stakes applications. Shapley value-based attribution methods such as SHAP, and kernel-specific adaptations including RKHS-SHAP, provide a principled framework for explainability -- but exact computation of Shapley values is generally intractable, forcing existing approaches to rely on approximations that incur unavoidable estimation error. We introduce PKeX-Shapley, an algorithm that exploits the multiplicative structure of product kernels to compute exact Shapley values for all $d$ features in quadratic time in $d$. The method rests on a distribution-free removal operator intrinsic to the product-kernel structure: removing a feature replaces its kernel factor with the multiplicative identity. This yields a parameter-free value function -- requiring no sampling and no density estimation -- and uniquely determines a functional decomposition of the model. Building on this value function, we develop shared recursive formulations that evaluate all feature attributions jointly, achieving amortized linear time per feature with numerical stability. Beyond predictive modeling, the framework extends to widely used kernel-based discrepancies such as the Maximum Mean Discrepancy (MMD) and the Hilbert-Schmidt Independence Criterion (HSIC), providing new tools for interpretable statistical analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PKeX-Shapley, an algorithm for exact computation of Shapley values in product-kernel methods. It defines a parameter-free value function v(S) by replacing each excluded feature's kernel factor with the multiplicative identity 1, then derives shared recursive formulations that compute all d feature attributions in quadratic time overall (amortized linear per feature) with claimed numerical stability. The approach is extended to kernel discrepancies including MMD and HSIC, and is positioned as providing a unique functional decomposition of the model output.
Significance. If the central construction is sound, the result would be significant: exact, sampling-free Shapley attributions for a broad class of kernel methods at practical cost, together with extensions to statistical discrepancies. The parameter-free and distribution-free character, plus the explicit recursion for joint evaluation, would constitute a concrete advance over approximation-based methods such as SHAP or RKHS-SHAP for this kernel family.
major comments (1)
- [§3.2 and §4.1] §3.2 (Value Function Definition) and §4.1 (Efficiency Axiom): the manuscript must explicitly verify that the defined v(S) satisfies the efficiency axiom for a general model f that is a non-linear functional of the kernel (e.g., kernel ridge regression or SVM decision function). The abstract claims a 'unique functional decomposition,' but the provided derivation sketch does not show that sum of attributions recovers f(x) minus the all-1 baseline once the recursion is applied; this verification is load-bearing for the claim that the quantities are Shapley values of the intended game.
minor comments (2)
- [§3] Notation for the removal operator and the recursive base cases should be introduced with a small worked example (d=2 or d=3) to make the shared recursion concrete before the general proof.
- [§4.3] The numerical-stability claim in the abstract would benefit from a short forward-error analysis or condition-number bound for the recursion, even if only in the supplementary material.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need to strengthen the verification of the efficiency axiom. We agree that an explicit demonstration is warranted for general non-linear functionals of the kernel and will revise the manuscript to include it.
read point-by-point responses
-
Referee: [§3.2 and §4.1] §3.2 (Value Function Definition) and §4.1 (Efficiency Axiom): the manuscript must explicitly verify that the defined v(S) satisfies the efficiency axiom for a general model f that is a non-linear functional of the kernel (e.g., kernel ridge regression or SVM decision function). The abstract claims a 'unique functional decomposition,' but the provided derivation sketch does not show that sum of attributions recovers f(x) minus the all-1 baseline once the recursion is applied; this verification is load-bearing for the claim that the quantities are Shapley values of the intended game.
Authors: We agree that an explicit verification is required. By definition of the Shapley value, efficiency holds for any value function v: the sum of attributions equals v(full set) − v(∅). In our construction, v(full set) is exactly the model output f(x) because no kernel factors are replaced, while v(∅) is the baseline obtained by replacing every factor with the multiplicative identity 1. This identity holds regardless of whether f is linear or a non-linear functional of the kernel (e.g., the decision function of an SVM or the predictor of kernel ridge regression), because the removal operator acts on the kernel before f is applied. The shared recursion is merely an efficient, numerically stable implementation of the standard Shapley formula; we will add a short inductive proof in §4.1 showing that the recursion preserves the telescoping sum property. A concrete numerical check on a kernel SVM will also be included to illustrate recovery of f(x) minus the all-1 baseline. revision: yes
Circularity Check
No significant circularity: derivation is self-contained algorithmic construction from explicit value-function definition
full rationale
The paper defines a parameter-free value function v(S) directly from the product-kernel structure by replacing each excluded feature's kernel factor with the multiplicative identity 1. It then derives shared recursive formulations that evaluate all feature attributions jointly from this definition. No fitted parameters, no reduction of predictions to prior fitted quantities, and no load-bearing self-citation chain are present in the core construction. The resulting algorithm computes exact Shapley values for the explicitly defined game, rendering the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The kernel is a product kernel whose factors multiply independently across features.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_add / LogicNat multiplicative homomorphism echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Definition 1. ... νxxx(S) = α⊤ kS(XS, xxxS). ... removing a feature replaces its kernel factor with the multiplicative identity.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leannone (efficiency is game-theoretic, not RS) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 4. ... sum of Shapley values satisfies Pd j=1 ϕxxx j = f(xxx) − f∅(xxx) where f∅(xxx) = Pn i=1 αi
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
Shapley values in product games equal an exact one-dimensional integral of a polynomial, computable via Gauss-Legendre quadrature with linear cost in the number of features.
-
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
Shapley values in product games equal the integral of a degree-(d-1) polynomial over [0,1], allowing provably exact or near-exact computation via Gauss-Legendre quadrature with O(d m_q) work.
-
Proxy-Based Approximation of Shapley and Banzhaf Interactions
ProxySHAP uses tree proxies plus residual correction to achieve state-of-the-art approximation of Shapley and Banzhaf interactions, with a polynomial-time exact method for tree ensembles.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.