Adalina: Adaptive Linear Approximation for the Shapley Value and Beyond
Pith reviewed 2026-05-10 18:33 UTC · model grok-4.3
The pith
A linear-space algorithm approximates Shapley values and semi-values with O(n/ε² log(1/δ)) queries while minimizing mean squared error for any utility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying a vector concentration inequality to unbiased randomized estimators of semi-values, the authors derive an O(n/ε² log(1/δ)) query complexity that simultaneously guarantees the stated high-probability L2 bound for all common semi-values and permits explicit minimization of mean squared error for any given utility. The resulting adaptive algorithm, Adalina, runs in linear time and linear space and unifies several prior unbiased estimators including OFA, kernelSHAP, and SHAP-IQ while characterizing when paired sampling helps.
What carries the argument
A vector concentration inequality applied to linear-space unbiased estimators of semi-values; it supplies both the query-complexity bound and the mechanism for choosing sampling weights that minimize mean squared error for a concrete utility.
If this is right
- All commonly used semi-values, including Shapley, Banzhaf, and others, receive the same query bound.
- Paired sampling improves the bound only for certain utility functions, which the framework identifies explicitly.
- Mean squared error can be minimized directly for any fixed utility without changing the linear space or time guarantees.
- Existing unbiased methods (OFA, kernelSHAP, SHAP-IQ, regression-adjusted) are recovered as special cases inside the same analysis.
Where Pith is reading between the lines
- The same concentration-based analysis could be applied to other solution concepts in cooperative games that admit linear unbiased estimators.
- In large-scale feature attribution, the adaptive MSE minimization may produce noticeably lower error than non-adaptive baselines once n exceeds a few hundred players.
- A deterministic derandomization of the adaptive weights would remove the remaining logarithmic factor in the failure probability.
Load-bearing premise
The vector concentration inequality must apply with the claimed tightness to the particular semi-value estimators under the linear-space constraint.
What would settle it
Run the algorithm on a sequence of increasing n and decreasing ε and check whether the observed failure rate or mean squared error deviates from the predicted O(n/ε² log(1/δ)) scaling on standard benchmark utility functions.
Figures
read the original abstract
The Shapley value, and its broader family of semi-values, has received much attention in various attribution problems. A fundamental and long-standing challenge is their efficient approximation, since exact computation generally requires an exponential number of utility queries in the number of players $n$. To meet the challenges of large-scale applications, we explore the limits of efficiently approximating semi-values under a $\Theta(n)$ space constraint. Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper query complexities for existing unbiased randomized algorithms. Within this framework, we systematically develop a linear-space algorithm that requires $O(\frac{n}{\epsilon^{2}}\log\frac{1}{\delta})$ utility queries to ensure $P(\|\hat{\boldsymbol\phi}-\boldsymbol\phi\|_{2}\geq\epsilon)\leq \delta$ for all commonly used semi-values. In particular, our framework naturally bridges OFA, unbiased kernelSHAP, SHAP-IQ and the regression-adjusted approach, and definitively characterizes when paired sampling is beneficial. Moreover, our algorithm allows explicit minimization of the mean squared error $\mathbb{E}[\|\hat{\boldsymbol\phi}-\boldsymbol\phi\|_{2}^{2}]$ for each specific utility function. Accordingly, we introduce the first adaptive, linear-time, linear-space randomized algorithm, Adalina, that theoretically achieves improved mean squared error. All of our theoretical findings are experimentally validated. Our code is available at https://github.com/watml/adalina.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a theoretical framework based on vector concentration inequalities to derive sharper query complexities for unbiased randomized algorithms approximating semi-values (including Shapley) under a Θ(n) space constraint. It claims an O(n/ε² log(1/δ)) utility query bound guaranteeing P(‖φ̂ - φ‖₂ ≥ ε) ≤ δ uniformly for common semi-values, introduces the adaptive linear-time linear-space algorithm Adalina that explicitly minimizes MSE for specific utility functions, bridges OFA/kernelSHAP/SHAP-IQ/regression-adjusted methods, characterizes when paired sampling helps, and validates all claims experimentally.
Significance. If the central guarantees hold, the work would be significant for large-scale attribution by delivering the first adaptive linear-time linear-space randomized algorithm with explicit per-utility MSE minimization and uniform high-probability bounds, while providing a unifying framework that explains existing methods and improves upon non-adaptive baselines.
major comments (2)
- [§3] §3 (theoretical framework and query-complexity derivation): the O(n/ε² log(1/δ)) bound is obtained by invoking a vector concentration inequality on the semi-value estimators; however, these estimators are linear combinations of utility evaluations whose sampling distributions depend on the semi-value and whose variances depend on the unknown utility function. The manuscript does not explicitly verify that the adaptive sampling scheme satisfies the coordinate-wise sub-Gaussianity or moment conditions required by the cited inequality, which is load-bearing for both the stated complexity and the uniformity claim across semi-values.
- [Theorem on query complexity] Theorem establishing the uniform bound (around the vector-concentration application): the claim that the same O(n/ε² log(1/δ)) query complexity holds for all commonly used semi-values follows only if the inequality transfers directly to each estimator family; without a per-semi-value check of the required tail conditions under the Θ(n)-space adaptive scheme, the uniformity does not necessarily follow from the cited inequality alone.
minor comments (2)
- [Abstract] Abstract: the statement that 'all of our theoretical findings are experimentally validated' is not accompanied by any description of the experimental protocol (range of n, choice of ε/δ, utility functions, or baselines), making it impossible to assess whether the experiments actually probe the adaptive MSE minimization or the concentration bound.
- [Notation] Notation section: the definitions of the sampling probabilities for each semi-value and the precise form of the linear estimator could be stated more explicitly to allow readers to verify the variance expressions used in the MSE minimization.
Simulated Author's Rebuttal
We thank the referee for their careful and constructive review of our manuscript. The comments on the theoretical framework are well-taken and point to places where additional explicit verification will strengthen the presentation. We address each major comment below and will incorporate the clarifications in the revised version.
read point-by-point responses
-
Referee: [§3] §3 (theoretical framework and query-complexity derivation): the O(n/ε² log(1/δ)) bound is obtained by invoking a vector concentration inequality on the semi-value estimators; however, these estimators are linear combinations of utility evaluations whose sampling distributions depend on the semi-value and whose variances depend on the unknown utility function. The manuscript does not explicitly verify that the adaptive sampling scheme satisfies the coordinate-wise sub-Gaussianity or moment conditions required by the cited inequality, which is load-bearing for both the stated complexity and the uniformity claim across semi-values.
Authors: We thank the referee for identifying this important point. The original manuscript invokes the vector concentration inequality on the unbiased linear estimators after establishing their sampling distributions for each semi-value, but does not include an explicit per-semi-value verification that the adaptive Θ(n)-space scheme preserves the required coordinate-wise sub-Gaussianity or moment bounds. While the adaptation is variance-reducing and the per-coordinate sampling probabilities are fixed by the semi-value definition (independent of the unknown utility), we agree this transfer was not spelled out. In the revision we will add a dedicated subsection (or appendix) that verifies the sub-Gaussian parameters for each common semi-value under the adaptive scheme, confirming that the same O(n/ε² log(1/δ)) bound holds. revision: yes
-
Referee: [Theorem on query complexity] Theorem establishing the uniform bound (around the vector-concentration application): the claim that the same O(n/ε² log(1/δ)) query complexity holds for all commonly used semi-values follows only if the inequality transfers directly to each estimator family; without a per-semi-value check of the required tail conditions under the Θ(n)-space adaptive scheme, the uniformity does not necessarily follow from the cited inequality alone.
Authors: We agree that uniformity across semi-values requires confirming that the tail conditions transfer to each estimator family under the adaptive scheme. The framework derives the bound from the general structure of unbiased linear combinations with semi-value-specific sampling distributions and bounded utility assumptions; the adaptive component minimizes empirical MSE without altering the fixed sampling probabilities per coordinate. To make the uniformity rigorous, the revision will include explicit per-semi-value checks (Shapley, Banzhaf, etc.) showing that the moment conditions are satisfied with constants independent of the utility function, thereby justifying the uniform O(n/ε² log(1/δ)) claim. revision: yes
Circularity Check
No circularity: bounds derived from external inequality and explicit optimization
full rationale
The paper constructs its query complexity O(n/ε² log(1/δ)) and Adalina algorithm by invoking an external vector concentration inequality, then applying it within a new framework to existing unbiased estimators (OFA, kernelSHAP, etc.) and performing explicit MSE minimization over sampling distributions for each utility function. No step equates a claimed prediction or bound to a fitted parameter, self-defined quantity, or self-citation chain by the paper's own equations. The derivation remains independent of the target result and does not rename known patterns or smuggle ansatzes via author citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A vector concentration inequality applies to the unbiased estimators of the semi-values.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper query complexities... O(n/ε² log 1/δ) utility queries... Adalina... minimizes the mean square error
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2.1 (Vector concentration inequality)... P(‖1/M Σ Xi‖₂ ≥ ε) ≤ 2 exp(−M ε²/(4σ²))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.