First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint
Pith reviewed 2026-05-08 18:11 UTC · model grok-4.3
The pith
A shared first-order error structure lets one optimize sampling and surrogate to cut leading MSE in probabilistic value estimates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing identification strategies for probabilistic values share a common first-order error structure consisting of an augmented inverse-probability weighted influence term whose form is fixed by the sampling law and a working surrogate. This representation supplies an explicit expression for the leading mean squared error, which depends jointly on the sampling distribution and the surrogate. The Efficiency-Aware Surrogate-adjusted Estimator (EASE) selects both quantities to minimize the leading MSE and thereby achieves lower error than prior methods for the same number of utility evaluations.
What carries the argument
The augmented inverse-probability weighted influence term, which isolates the dominant error in Monte Carlo approximations of probabilistic values and directly determines the first-order mean squared error.
If this is right
- EASE produces lower mean squared error than prior estimators while using the same number of model evaluations.
- The first-order MSE formula gives a concrete criterion for jointly tuning sampling probabilities and the surrogate function.
- Any new estimator can be assessed by how closely its sampling law and surrogate match the optimal choice under this criterion.
- The same error decomposition applies across different probabilistic values including semivalues.
Where Pith is reading between the lines
- The first-order viewpoint may extend to variance-reduced sampling schemes that incorporate higher moments of the influence function.
- The framework could guide adaptive sampling rules that update the distribution online based on observed residuals.
- Similar first-order analysis might improve Monte Carlo methods used in related areas such as causal effect estimation or sensitivity analysis.
Load-bearing premise
The first-order error term dominates the overall mean squared error and that directly minimizing it produces real gains before higher-order terms become important.
What would settle it
An experiment that records the empirical mean squared error of EASE versus current estimators on fixed-budget Monte Carlo runs for Shapley values on standard datasets and checks whether the observed error ordering matches the ordering predicted by the first-order MSE formula.
Figures
read the original abstract
Probabilistic values, including Shapley values and semivalues, provide a model-agnostic framework to attribute the behavior of a black-box model to data points or features, with a wide range of applications including explainable artificial intelligence and data valuation. However, their exact computation requires utility evaluations over exponentially many coalitions, making Monte Carlo approximation essential in modern machine learning applications. Existing estimators are often developed through different identification strategies, including weighted averages, self-normalized weighting, regression adjustment, and weighted least squares. Our key observation is that these seemingly distinct constructions share a common first-order error structure, in which the leading term is an augmented inverse-probability weighted influence term determined by the sampling law and a working surrogate function. This first-order representation yields an explicit expression for the leading mean squared error (MSE), which characterizes how the sampling law and the surrogate jointly determine statistical efficiency. Guided by this criterion, we propose an Efficiency-Aware Surrogate-adjusted Estimator (EASE) that directly chooses the sampling law and surrogate to minimize the first-order MSE. We demonstrate that EASE consistently outperforms state-of-the-art estimators for various probabilistic values.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that estimators for probabilistic values (Shapley values and semivalues) share a common first-order error structure given by an augmented inverse-probability weighted influence term determined jointly by the sampling law and a working surrogate function. This representation produces an explicit leading MSE expression that is minimized over sampling laws and surrogates to obtain the Efficiency-Aware Surrogate-adjusted Estimator (EASE), which is shown to outperform existing methods across various probabilistic values.
Significance. If the first-order approximation is accurate and the resulting EASE yields measurable finite-sample gains, the work supplies a unified statistical lens for designing efficient Monte Carlo estimators of probabilistic values. This could meaningfully reduce the computational burden of model explanations and data valuation tasks by providing a principled optimization criterion rather than ad-hoc constructions.
major comments (2)
- [Abstract (first-order representation and MSE minimization)] The central derivation assumes that the first-order error structure (augmented IPW influence term) dominates the actual MSE, yet no explicit bound on the higher-order remainder or identification of regimes (e.g., sample size, utility regularity) where this holds is supplied. Without such control, minimizing the leading term does not guarantee that EASE improves upon estimators whose remainders differ.
- [Experimental validation and EASE construction] The outperformance claim for EASE is load-bearing for the contribution, but the manuscript provides no analysis of whether the optimized sampling law and surrogate remain effective once higher-order effects or finite-sample bias from the surrogate are included; this leaves the practical utility of the first-order criterion unverified.
minor comments (1)
- [Notation and setup] Clarify the precise definition of the surrogate function and its dependence on the sampling law in the notation for the influence term.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our work. We provide point-by-point responses to the major comments and outline the revisions we intend to make to strengthen the manuscript.
read point-by-point responses
-
Referee: The central derivation assumes that the first-order error structure (augmented IPW influence term) dominates the actual MSE, yet no explicit bound on the higher-order remainder or identification of regimes (e.g., sample size, utility regularity) where this holds is supplied. Without such control, minimizing the leading term does not guarantee that EASE improves upon estimators whose remainders differ.
Authors: We acknowledge the validity of this observation. The manuscript derives the leading MSE term but does not provide explicit bounds on the higher-order remainder. This is a common approach in developing asymptotically efficient estimators, where the focus is on the dominant term. To address the concern, we will add a discussion section that outlines the conditions (such as large sample sizes and regular utility functions) under which the first-order approximation is reliable, drawing from standard results in semiparametric statistics. We will also note that the empirical results support the practical benefits even in moderate sample regimes. revision: partial
-
Referee: The outperformance claim for EASE is load-bearing for the contribution, but the manuscript provides no analysis of whether the optimized sampling law and surrogate remain effective once higher-order effects or finite-sample bias from the surrogate are included; this leaves the practical utility of the first-order criterion unverified.
Authors: We agree that further verification of the criterion's robustness to higher-order effects is important. While the current experiments demonstrate outperformance, they do not specifically dissect the contribution of higher-order terms. In the revised manuscript, we will incorporate additional numerical studies that assess the performance of EASE under varying conditions, including different sample sizes and surrogate estimation procedures, to confirm that the first-order optimization translates to finite-sample gains. This will help verify the practical utility. revision: yes
Circularity Check
No significant circularity; first-order MSE derivation is forward statistical analysis
full rationale
The paper begins by observing that existing estimators (weighted averages, self-normalized weighting, regression adjustment, weighted least squares) share a common first-order error structure consisting of an augmented inverse-probability weighted influence term. From this structure it derives an explicit leading MSE expression that depends on the sampling law and surrogate. It then minimizes that expression to define the EASE estimator. This is a standard forward derivation of an efficiency criterion followed by an empirical demonstration of outperformance; it does not reduce any claimed result to a fitted parameter by construction, rename a known quantity, or rely on a load-bearing self-citation whose content is unverified. No equation is shown to be equivalent to its own inputs, and the central claim remains independent of the optimization step itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- surrogate function
axioms (1)
- domain assumption Existing estimators share a common first-order error structure given by an augmented inverse-probability weighted influence term
Lean theorems connected to this paper
-
Foundation/Cost (Jcost = ½(x+x⁻¹)−1)washburn_uniqueness_aczel — RS forces J as the unique ratio-symmetric calibrated cost; the paper's sqrt-of-residual-moment allocation is unrelated to J or to ratio symmetry. unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
residual-aware oracle law can be viewed as a Neyman-type allocation ... q*(S; h) ∝ sqrt(M_k(h))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Improving polynomial estimation of the shapley value by stratified random sampling with optimum allocation.Computers & Operations Research, 82:180–188, 2017
Castro, J., Gómez, D., Molina, E., et al. Improving polynomial estimation of the shapley value by stratified random sampling with optimum allocation.Computers & Operations Research, 82:180–188, 2017
2017
-
[2]
Polynomial calculation of the shapley value based on sampling.Computers & Operations Research, 36(5):1726–1730, 2009
Castro, J., Gómez, D., and Tejada, J. Polynomial calculation of the shapley value based on sampling.Computers & Operations Research, 36(5):1726–1730, 2009
2009
-
[3]
L., Hu, R., Gonzalez, J., et al
Chau, S. L., Hu, R., Gonzalez, J., et al. RKHS-SHAP: Shapley values for kernel methods. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors,Advances in Neural Information Processing Systems. 2022
2022
-
[4]
C., Lundberg, S
Chen, H., Covert, I. C., Lundberg, S. M., et al. Algorithms to estimate shapley value feature attributions.Nature Machine Intelligence, 5(6):590–601, 2023
2023
-
[5]
J., et al
Chen, T., Seshadri, A., Villani, M. J., et al. A unified framework for provably efficient algorithms to estimate shapley values. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. 2025
2025
-
[6]
and Lee, S.-I
Covert, I. and Lee, S.-I. Improving KernelSHAP: Practical shapley value estimation using linear regression. In A. Banerjee and K. Fukumizu, editors,Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 3457–3465. PMLR, 2021. 31
2021
-
[7]
Model-assisted estimation in high-dimensional settings for survey data.Journal of Applied Statistics, 50(3):761–785, 2023
Dagdoug, M., Goga, C., and Haziza, D. Model-assisted estimation in high-dimensional settings for survey data.Journal of Applied Statistics, 50(3):761–785, 2023
2023
-
[8]
and Giné, E.Decoupling: From dependence to independence
de la Peña, V. and Giné, E.Decoupling: From dependence to independence. Springer New York, 1999
1999
-
[9]
A survey of data attribution: Methods, applications, and evaluation in the era of generative AI
Deng, J., Hu, Y., Hu, P., et al. A survey of data attribution: Methods, applications, and evaluation in the era of generative AI. 2025
2025
-
[10]
Dubey, P., Neyman, A., and Weber, R. J. Value theory without efficiency.Math. Oper. Res., 6(1):122–128, 1981. doi:10.1287/moor.6.1.122
-
[11]
SHAP-IQ: Unified approximation of any-order shapley interactions
Fumagalli, F., Muschalik, M., Kolpaczki, P., et al. SHAP-IQ: Unified approximation of any-order shapley interactions. InThirty-seventh Conference on Neural Information Processing Systems. 2023
2023
-
[12]
T., and Musco, C
Fumagalli, F., Witter, R. T., and Musco, C. PolySHAP: Extending KernelSHAP with interaction-informed polynomial regression. InInternational Conference on Learning Representations. 2026
2026
-
[13]
an essay on the logical foundations of survey sampling, part one
Hájek, J. Comment on “an essay on the logical foundations of survey sampling, part one”. In V. P. Godambe and D. A. Sprott, editors,Foundations of Statistical Inference, page 236. Holt, Rinehart and Winston, Toronto, 1971
1971
-
[14]
Horvitz, D. G. and Thompson, D. J. A generalization of sampling without replacement from a finite universe.Journal of the American Statistical Association, 47(260):663–685, 1952
1952
-
[15]
Towards efficient data valuation based on the shapley value.AISTATS, abs/1902.10275:1167–1176, 2019
Jia, R., Dao, D., Wang, B., et al. Towards efficient data valuation based on the shapley value.AISTATS, abs/1902.10275:1167–1176, 2019
-
[16]
Kolpaczki, P., Bengs, V., Muschalik, M., et al. Approximating the shapley value without marginal contributions.Proceedings of the AAAI Conference on Artificial Intelligence, 38(12):13246–13255, 2024. doi:10.1609/aaai.v38i12.29225
-
[17]
and Zou, J
Kwon, Y. and Zou, J. Y. Beta shapley: A unified and noise-reduced data valuation framework for machine learning.International Conference on Artificial Intelligence and Statistics, 151:8780–8802, 2021
2021
-
[18]
Faithful group shapley value
Lee, K., Liu, Z., Tang, W., et al. Faithful group shapley value. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. 2025
2025
-
[19]
and Yu, Y
Li, W. and Yu, Y. Faster approximation of probabilistic and distributional values via least squares. In B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and 32 Y. Sun, editors,International Conference on Representation Learning, volume 2024, pages 51182–51216. 2024
2024
- [20]
-
[21]
Measuring the effect of training data on deep learning predictions via randomized experiments
Lin, J., Zhang, A., Lécuyer, M., et al. Measuring the effect of training data on deep learning predictions via randomized experiments. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 13468–1...
2022
-
[22]
M., Erion, G
Lundberg, S. M., Erion, G. G., and Lee, S.-I. Consistent individualized feature attribution for tree ensembles.arXiv [cs.LG], 2018
2018
-
[23]
Lundberg, S. M. and Lee, S.-I. A unified approach to interpreting model predictions. Neural Information Processing Systems, pages 4765–4774, 2017. doi:10.5555/3295222. 3295230
-
[24]
Bounding the estimation error of sampling- based shapley value approximation.arXiv [cs.GT], 2013
Maleki, S., Tran-Thanh, L., Hines, G., et al. Bounding the estimation error of sampling- based shapley value approximation.arXiv [cs.GT], 2013
2013
-
[25]
Portfolio performance attribution via shapley value
Moehle, N., Boyd, S., and Ang, A. Portfolio performance attribution via shapley value. 2021
2021
-
[26]
Musco, C. and Witter, R. T. Provably accurate shapley value estimation via leverage score sampling.ArXiv, abs/2410.01917, 2024. doi:10.48550/arXiv.2410.01917
-
[27]
Neyman, J. On the two different aspects of the representative method : The method of stratified sampling and the method of purposive selection.Journal of the Royal Statistical Society, 97(4):558–606, 1934
1934
-
[28]
Rubin, D. B. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688–701, 1974
1974
-
[29]
Springer Series in Statistics
Särndal, Swensson, B., and Wretman, J.Model assisted survey sampling. Springer Series in Statistics. Springer, New York, NY, 1992
1992
-
[30]
Shapley, L. S. A value for n-person games. 2(28):307–317, 1953
1953
-
[31]
Wang, J. T. and Jia, R. Data banzhaf: A robust data valuation framework for machine learning.AISTATS, 206:6388–6421, 2022. 33
2022
-
[32]
Weber, R. J. Probabilistic values for games. In A. E. Roth, editor,The Shapley Value: Essays in Honor of Lloyd S. Shapley, pages 101–120. Cambridge University Press, Cambridge, 1988
1988
-
[33]
T., Liu, Y., and Musco, C
Witter, R. T., Liu, Y., and Musco, C. Regression-adjusted monte carlo estimators for shapley values and probabilistic values. InNeural Information Processing Systems. 2025
2025
-
[34]
Wu, M., Jia, R., Lin, C., et al. Variance reduced shapley value estimation for trustworthy data valuation.Computers & Operations Research, 159:106305, 2023. doi:10.1016/j.co r.2023.106305
-
[35]
Zhang, J., Sun, Q., Liu, J., et al. Efficient sampling approaches to shapley value approximation.Proceedings of the ACM on Management of Data, 1(1):1–24, 2023. doi:10.1145/3588728. 34 SUPPLEMENTARY MATERIAL Contents A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 A.1 Proof of Theorem 1 . . ....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.