Amortized Linear-time Exact Shapley Value for Product-Kernel Methods

Krikamol Muandet; Majid Mohammadi; Siu Lun Chau

arxiv: 2505.16516 · v3 · submitted 2025-05-22 · 💻 cs.LG · cs.AI

Amortized Linear-time Exact Shapley Value for Product-Kernel Methods

Majid Mohammadi , Siu Lun Chau , Krikamol Muandet This is my paper

Pith reviewed 2026-05-22 13:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Shapley valuesproduct kernelsexplainable AIkernel methodsMaximum Mean DiscrepancyHilbert-Schmidt Independence Criterionfeature attribution

0 comments

The pith

Product kernels allow exact Shapley values for all features to be computed in quadratic time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an algorithm called PKeX-Shapley that computes exact Shapley values for models built on product kernels. It defines a removal operation that replaces the kernel factor for any one feature with the constant 1, which creates a parameter-free value function without sampling or density estimation. Shared recursive calculations then produce attributions for every feature at once, giving quadratic time overall and amortized linear time per feature along with numerical stability. The same removal idea extends the framework to kernel discrepancies such as MMD and HSIC.

Core claim

For product kernels, a distribution-free removal operator replaces each feature's kernel factor with the constant 1, which defines a unique value function for Shapley attribution. Shared recursive formulations then compute all feature attributions jointly, yielding exact values in quadratic time in the number of features with amortized linear time per feature and numerical stability.

What carries the argument

The distribution-free removal operator intrinsic to the product-kernel structure, where removing a feature replaces its kernel factor with the multiplicative identity 1.

If this is right

Exact Shapley values become available without any approximation error for product kernel models.
Feature attributions scale to higher dimensions because the total cost grows only quadratically with the number of features.
The same removal construction supplies exact attributions for kernel-based discrepancies including MMD and HSIC.
Recursive sharing of intermediate results guarantees numerical stability across all attributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The removal operator could be reused to speed up other attribution techniques whenever a model admits a multiplicative decomposition.
Gaussian process or support-vector models that already employ product kernels could receive exact feature attributions at modest extra cost.
Similar structure-exploiting ideas might reduce the cost of related explainability methods in settings that are not strictly kernel-based.

Load-bearing premise

The kernel must be a product kernel so that each feature can be removed simply by setting its factor to one.

What would settle it

Run PKeX-Shapley and a brute-force exact Shapley computation on a small dataset with a known product kernel and check whether the two sets of feature attributions match exactly.

read the original abstract

Kernel methods are widely used in machine learning and statistics for their flexibility and expressive power, yet their black-box nature limits adoption in high-stakes applications. Shapley value-based attribution methods such as SHAP, and kernel-specific adaptations including RKHS-SHAP, provide a principled framework for explainability -- but exact computation of Shapley values is generally intractable, forcing existing approaches to rely on approximations that incur unavoidable estimation error. We introduce PKeX-Shapley, an algorithm that exploits the multiplicative structure of product kernels to compute exact Shapley values for all $d$ features in quadratic time in $d$. The method rests on a distribution-free removal operator intrinsic to the product-kernel structure: removing a feature replaces its kernel factor with the multiplicative identity. This yields a parameter-free value function -- requiring no sampling and no density estimation -- and uniquely determines a functional decomposition of the model. Building on this value function, we develop shared recursive formulations that evaluate all feature attributions jointly, achieving amortized linear time per feature with numerical stability. Beyond predictive modeling, the framework extends to widely used kernel-based discrepancies such as the Maximum Mean Discrepancy (MMD) and the Hilbert-Schmidt Independence Criterion (HSIC), providing new tools for interpretable statistical analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PKeX-Shapley gives exact Shapley values for product-kernel models by setting excluded factors to 1 and using shared recursion, but the efficiency axiom still needs explicit confirmation for non-linear cases like kernel ridge regression.

read the letter

The paper's core move is to treat product kernels as multiplicative, so removing a feature just replaces its factor with the constant 1. This creates a parameter-free value function that needs no sampling or density estimates. From there they build a recursive scheme that computes all d attributions together in quadratic time overall, which works out to amortized linear time per feature plus a claim of numerical stability. That combination looks new compared with the approximation-heavy methods cited in the abstract, and the extension to MMD and HSIC is a straightforward bonus that could matter for discrepancy-based stats work. The algebra is clean on paper and the absence of fitted parameters avoids the usual circularity problems in attribution methods. The main soft spot is whether the resulting attributions actually satisfy efficiency once the model is a non-linear functional of the kernel. For a plain kernel mean it might hold by construction, but the abstract does not show a direct verification that the sum recovers f(x) minus the all-1 baseline when the model is kernel ridge regression or an SVM decision function. The recursion is also asserted to preserve the sum, yet no edge-case check or small-scale proof is visible in the summary. If that property fails, the numbers are not Shapley values for the game people care about. This is aimed at researchers who already use product kernels and want exact rather than approximate attributions. A reader who knows both kernel algebra and cooperative games would get the most out of it. The work is coherent enough on its own terms to deserve referee time, even if the efficiency step turns out to need extra proof or counter-examples.

Referee Report

1 major / 2 minor

Summary. The paper introduces PKeX-Shapley, an algorithm for exact computation of Shapley values in product-kernel methods. It defines a parameter-free value function v(S) by replacing each excluded feature's kernel factor with the multiplicative identity 1, then derives shared recursive formulations that compute all d feature attributions in quadratic time overall (amortized linear per feature) with claimed numerical stability. The approach is extended to kernel discrepancies including MMD and HSIC, and is positioned as providing a unique functional decomposition of the model output.

Significance. If the central construction is sound, the result would be significant: exact, sampling-free Shapley attributions for a broad class of kernel methods at practical cost, together with extensions to statistical discrepancies. The parameter-free and distribution-free character, plus the explicit recursion for joint evaluation, would constitute a concrete advance over approximation-based methods such as SHAP or RKHS-SHAP for this kernel family.

major comments (1)

[§3.2 and §4.1] §3.2 (Value Function Definition) and §4.1 (Efficiency Axiom): the manuscript must explicitly verify that the defined v(S) satisfies the efficiency axiom for a general model f that is a non-linear functional of the kernel (e.g., kernel ridge regression or SVM decision function). The abstract claims a 'unique functional decomposition,' but the provided derivation sketch does not show that sum of attributions recovers f(x) minus the all-1 baseline once the recursion is applied; this verification is load-bearing for the claim that the quantities are Shapley values of the intended game.

minor comments (2)

[§3] Notation for the removal operator and the recursive base cases should be introduced with a small worked example (d=2 or d=3) to make the shared recursion concrete before the general proof.
[§4.3] The numerical-stability claim in the abstract would benefit from a short forward-error analysis or condition-number bound for the recursion, even if only in the supplementary material.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need to strengthen the verification of the efficiency axiom. We agree that an explicit demonstration is warranted for general non-linear functionals of the kernel and will revise the manuscript to include it.

read point-by-point responses

Referee: [§3.2 and §4.1] §3.2 (Value Function Definition) and §4.1 (Efficiency Axiom): the manuscript must explicitly verify that the defined v(S) satisfies the efficiency axiom for a general model f that is a non-linear functional of the kernel (e.g., kernel ridge regression or SVM decision function). The abstract claims a 'unique functional decomposition,' but the provided derivation sketch does not show that sum of attributions recovers f(x) minus the all-1 baseline once the recursion is applied; this verification is load-bearing for the claim that the quantities are Shapley values of the intended game.

Authors: We agree that an explicit verification is required. By definition of the Shapley value, efficiency holds for any value function v: the sum of attributions equals v(full set) − v(∅). In our construction, v(full set) is exactly the model output f(x) because no kernel factors are replaced, while v(∅) is the baseline obtained by replacing every factor with the multiplicative identity 1. This identity holds regardless of whether f is linear or a non-linear functional of the kernel (e.g., the decision function of an SVM or the predictor of kernel ridge regression), because the removal operator acts on the kernel before f is applied. The shared recursion is merely an efficient, numerically stable implementation of the standard Shapley formula; we will add a short inductive proof in §4.1 showing that the recursion preserves the telescoping sum property. A concrete numerical check on a kernel SVM will also be included to illustrate recovery of f(x) minus the all-1 baseline. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation is self-contained algorithmic construction from explicit value-function definition

full rationale

The paper defines a parameter-free value function v(S) directly from the product-kernel structure by replacing each excluded feature's kernel factor with the multiplicative identity 1. It then derives shared recursive formulations that evaluate all feature attributions jointly from this definition. No fitted parameters, no reduction of predictions to prior fitted quantities, and no load-bearing self-citation chain are present in the core construction. The resulting algorithm computes exact Shapley values for the explicitly defined game, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the kernel factors multiplicatively and that the identity replacement constitutes a valid removal operator; no free parameters or new entities are introduced.

axioms (1)

domain assumption The kernel is a product kernel whose factors multiply independently across features.
This property is required for the removal operator to replace a feature factor with the multiplicative identity.

pith-pipeline@v0.9.0 · 5758 in / 1206 out tokens · 34759 ms · 2026-05-22T13:13:15.300105+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_add / LogicNat multiplicative homomorphism echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Definition 1. ... νxxx(S) = α⊤ kS(XS, xxxS). ... removing a feature replaces its kernel factor with the multiplicative identity.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean none (efficiency is game-theoretic, not RS) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 4. ... sum of Shapley values satisfies Pd j=1 ϕxxx j = f(xxx) − f∅(xxx) where f∅(xxx) = Pn i=1 αi

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
cs.LG 2026-05 unverdicted novelty 7.0

Shapley values in product games equal an exact one-dimensional integral of a polynomial, computable via Gauss-Legendre quadrature with linear cost in the number of features.
QuadraSHAP: Stable and Scalable Shapley Values for Product Games via Gauss-Legendre Quadrature
cs.LG 2026-05 conditional novelty 7.0

Shapley values in product games equal the integral of a degree-(d-1) polynomial over [0,1], allowing provably exact or near-exact computation via Gauss-Legendre quadrature with O(d m_q) work.
Proxy-Based Approximation of Shapley and Banzhaf Interactions
cs.LG 2026-05 unverdicted novelty 6.0

ProxySHAP uses tree proxies plus residual correction to achieve state-of-the-art approximation of Shapley and Banzhaf interactions, with a polynomial-time exact method for tree ensembles.