pith. sign in

arxiv: 2606.00867 · v1 · pith:7K3KMBCInew · submitted 2026-05-30 · 📊 stat.ML · cs.LG· eess.SP

Statistical Analysis of using the Shapley Value for Sensor Anomaly Localization with Accurate Classifiers

Pith reviewed 2026-06-28 17:50 UTC · model grok-4.3

classification 📊 stat.ML cs.LGeess.SP
keywords Shapley valuesensor anomaly localizationstatistical analysisindependent observationscorrelated bivariate Gaussiananomaly detectionoptimal binary classifiers
0
0 comments X

The pith

For independent sensor observations the Shapley value anomaly test is exactly equivalent to a simpler single-term test with identical error probability, while the two differ and one can outperform the other in correlated bivariate cases dep

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that when sensor observations are independent an optimized Shapley-value anomaly test reduces to a lower-complexity test that uses only one term from the Shapley calculation and produces the same probability of error. In contrast, for statistically dependent observations drawn from correlated bivariate Gaussian or Laplacian distributions with constant or Gaussian attacks the two tests generate different decision regions and different error probabilities. The analysis further shows that the full Shapley test is sometimes strictly worse and sometimes strictly better than the single-term test, with the outcome governed by the sign of the correlation when its magnitude is large. The results indicate that the two approaches can be combined to obtain a strictly superior test in the dependent setting.

Core claim

We prove that for cases with independent sensor observations, an optimized anomaly test using the Shapley value is equivalent to an optimized lower-complexity anomaly test using a single term in the Shapley value calculation, yielding the exact same probability of error. For some popular dependent observation cases involving two sensors, including correlated bivariate Gaussian/Laplacian probability density functions and constant/Gaussian attacks/anomalies, we prove that these two tests are fundamentally different, yielding different decision regions and error probabilities. Further, we prove that the Shapley value test is sometimes strictly inferior to the other test in certain statistically

What carries the argument

Comparison of the full Shapley value versus a single term from its calculation inside optimal binary classifiers for deciding whether a given sensor observation is anomalous.

If this is right

  • For independent observations the lower-complexity single-term test can be used in place of the Shapley-value test without any increase in error probability.
  • In the examined dependent bivariate cases the two tests produce different decision regions, so their error probabilities are not the same.
  • When correlation is large the relative performance of the two tests reverses with the sign of the correlation coefficient.
  • A test that combines the Shapley value and the single term can achieve strictly lower error probability than either alone in the dependent bivariate setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The equivalence result may extend to other joint distributions that satisfy conditional independence, allowing the simpler test to be used more broadly.
  • In networks with many sensors the computational saving from the single-term test could matter for real-time localization even when observations are only approximately independent.
  • A practical system could first test for independence among sensors and then select or combine the two methods accordingly.

Load-bearing premise

The existence of mathematically defined optimum binary classifiers together with the assumption that sensor observations follow either independence or the specific correlated bivariate Gaussian and Laplacian distributions with the listed attack types.

What would settle it

Explicit computation of the error probability for a bivariate Gaussian model with large positive correlation and additive anomaly, showing whether the Shapley-value test or the single-term test achieves the lower error rate.

Figures

Figures reproduced from arXiv: 2606.00867 by Rick S. Blum, Xubin Fang.

Figure 1
Figure 1. Figure 1: Statistical error probabilities for the binary hypothesis test discussed in the beginning of Theorem III.6. . As per [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Plots of ∆(x1, x2) from Equation (26) as a function of x2 for various fixed values of x1. The observations follow a bivariate Laplacian distribution under an additive attack/anomaly. calculation. We theoretically and numerically compare the Shapley value test, using ϕ(xi), with the test using v(i), for both independent and dependent observation scenarios. For the case of statistically independent sensor ob… view at source ↗
read the original abstract

Recent publications have suggested using the Shap- ley value for sensor anomaly/attack localization. We study the performance of such an approach by using mathematically de- fined optimum binary classifiers in the Shapley value calculation. To judge localization performance, we study the ability of the Shapley value of a given sensor observation to determine if that observation is anomalous. First, we prove that for cases with independent sensor observations, an optimized anomaly test using the Shapley value is equivalent to an optimized lower-complexity anomaly test using a single term in the Shapley value calculation, yielding the exact same probability of error. For some popular dependent observation cases involving two sensors, including correlated bivariate Gaussian/Laplacian probability density functions and constant/Gaussian at- tacks/anomalies, we prove that these two tests are fundamentally different, yielding different decision regions and error probabil- ities. Further, we prove that the Shapley value test is sometimes strictly inferior to the other (single term in Shapley calculation) test in certain statistically dependent bivariate Gaussian scenarios with large correlation magnitude and additive attacks/anomalies, while it is strictly superior in others, depending on the sign of the correlation. One can combine these two approaches to obtain a strictly better approach in these cases. These results, which provide the first theoretical statistical analysis of Shapley-based localization, seem very interesting based on the wide acceptance of the Shapley value by many researchers and should encourage further research on this topic. Numerical results are provided which illustrate our findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript analyzes the use of Shapley values for localizing sensor anomalies/attacks when optimal binary classifiers are employed in the value function. It proves equivalence (identical error probability) between a full Shapley-value test and a single-term test for independent observations. For two-sensor dependent cases (correlated bivariate Gaussian/Laplacian densities with constant or Gaussian additive attacks), the tests produce distinct decision regions and error rates; the Shapley test is shown to be strictly superior or inferior depending on correlation sign and magnitude, with a suggestion that combining the approaches yields a strictly better detector. Numerical illustrations are included.

Significance. If the derivations hold, the work supplies the first explicit statistical comparison of Shapley-based anomaly localization against simpler alternatives, establishing when the two coincide exactly and when one dominates. The parameter-free proofs on concrete distributions, together with the explicit superiority/inferiority results that depend on correlation sign, constitute a clear, falsifiable contribution that can guide practical use of Shapley values in sensor security.

minor comments (3)
  1. [§4] The abstract states that proofs are given for the dependent bivariate cases, but the main text should include an explicit statement of the attack magnitude range over which the strict superiority/inferiority holds (e.g., a lemma or corollary after the decision-region derivation).
  2. [§2] Notation for the value function v(S) and the single-term test should be introduced with a short table or equation block early in §2 to avoid repeated parenthetical definitions later.
  3. [§5] Figure captions for the numerical results should state the exact correlation values, attack type, and SNR used in each panel so that the plots can be reproduced without consulting the text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and the recommendation for minor revision. The summary accurately captures the core contributions regarding equivalence for independent sensors and the strict superiority/inferiority results for correlated bivariate cases.

Circularity Check

0 steps flagged

No significant circularity; direct proofs on distributions

full rationale

The paper's central claims consist of explicit equivalence proofs and strict comparisons between the Shapley-value test and the single-term test. These are derived step-by-step from the definitions of optimal binary classifiers (assumed to exist) and the concrete joint PDFs (independent sensors; correlated bivariate Gaussian/Laplacian with additive attacks). Decision regions and error probabilities are obtained by direct integration or comparison of likelihood ratios under the stated models. No parameters are fitted and then relabeled as predictions, no self-citation chain supports a load-bearing uniqueness claim, and no ansatz is imported via prior work. The results therefore reduce only to the input distributions and classifier optimality assumption, which are external to the derivation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions about observation distributions and classifier optimality rather than new free parameters or invented entities.

axioms (2)
  • domain assumption Sensor observations follow the examined distributions including independent cases and correlated bivariate Gaussian or Laplacian PDFs with constant or Gaussian attacks
    Invoked to derive decision regions and error probabilities for both the Shapley test and the single-term test in the dependent scenarios.
  • domain assumption Mathematically defined optimum binary classifiers exist for the anomaly detection task
    Required to compare the performance of the Shapley-value test against the lower-complexity single-term test.

pith-pipeline@v0.9.1-grok · 5807 in / 1529 out tokens · 30360 ms · 2026-06-28T17:50:43.499752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 20 canonical work pages · 1 internal anchor

  1. [1]

    Or- der statistic estimation with application to tracking in autonomous driving,

    K. Yang, Y . Bar-Shalom, P. Willett, and S. Hunt, “Or- der statistic estimation with application to tracking in autonomous driving,”IEEE Transactions on Aerospace and Electronic Systems, vol. 59, no. 4, pp. 3531–3538, 2023.DOI: 10.1109/TAES.2022.3229650

  2. [2]

    Dis- tributed detection of a nuclear radioactive source using fusion of correlated decisions,

    A. Sundaresan, P. K. Varshney, and N. S. Rao, “Dis- tributed detection of a nuclear radioactive source using fusion of correlated decisions,” in2007 10th Interna- tional Conference on Information Fusion, 2007, pp. 1– 7.DOI: 10.1109/ICIF.2007.4408143

  3. [3]

    Harvesting am- bient rf for presence detection through deep learning,

    Y . Liu, T. Wang, Y . Jiang, and B. Chen, “Harvesting am- bient rf for presence detection through deep learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1571–1583, 2022.DOI: 10. 1109/TNNLS.2020.3042908

  4. [4]

    Sensors cybersecurity,

    D. A. Gritzalis, G. Pantziou, and R. Rom ´an-Castro, “Sensors cybersecurity,”Sensors (Basel), vol. 21, no. 5, Mar. 2021.DOI: 10.3390/s21051762 [5]The global risks report 2020, 2020. [Online]. Available: https : / / www. weforum . org / reports / the - global - risks - report-2020

  5. [5]

    E-sfd: Explainable sensor fault detection in the ics anomaly detection system,

    C. Hwang and T. Lee, “E-sfd: Explainable sensor fault detection in the ics anomaly detection system,”IEEE Access, vol. 9, pp. 140 470–140 486, 2021.DOI: 10 . 1109/ACCESS.2021.3119573

  6. [6]

    Unsupervised multi-sensor anomaly local- ization with explainable ai,

    M. Ameli, V . Pfanschilling, A. Amirli, W. Maass, and K. Kersting, “Unsupervised multi-sensor anomaly local- ization with explainable ai,” inInternational Conference on Artificial Intelligence Applications and Innovations, Springer, 2022, pp. 507–519

  7. [7]

    Z. Li, Y . Zhu, and M. van Leeuwen,A survey on ex- plainable anomaly detection, 2023. arXiv: 2210.06959 [cs.LG]. [Online]. Available: https://arxiv.org/abs/ 2210.06959

  8. [8]

    Asymptotic performance of a distributed detection system in correlated gaus- sian noise,

    V . Aalo and R. Viswanathan, “Asymptotic performance of a distributed detection system in correlated gaus- sian noise,”IEEE Transactions on Signal Processing, vol. 40, no. 1, pp. 211–213, 1992

  9. [9]

    Target location estima- tion in sensor networks with quantized data,

    R. Niu and P. K. Varshney, “Target location estima- tion in sensor networks with quantized data,”IEEE Transactions on Signal Processing, vol. 54, no. 12, pp. 4519–4528, Dec. 2006.DOI: 10.1109/TSP.2006. 882082

  10. [10]

    On the optimality of the likelihood ratio test for local sensor decision rules in the presence of nonindependent observations,

    B. Chen and P. K. Willett, “On the optimality of the likelihood ratio test for local sensor decision rules in the presence of nonindependent observations,”IEEE Transactions on Signal Processing, vol. 48, no. 12, pp. 3281–3291, 2000

  11. [11]

    Decentralized detection in sensor networks,

    J.-F. Chamberland and V . V . Veeravalli, “Decentralized detection in sensor networks,”IEEE Transactions on Signal Processing, vol. 51, no. 8, pp. 2107–2116, 2003

  12. [12]

    Distributed detec- tion in the presence of byzantine sensors in wireless sen- sor networks,

    S. Marano, V . Matta, and L. Tong, “Distributed detec- tion in the presence of byzantine sensors in wireless sen- sor networks,”IEEE Transactions on Signal Processing, vol. 57, no. 8, pp. 2875–2888, 2009

  13. [13]

    Bayesian quick- est detection of propagating spatial events,

    T. Halme, E. Nitzan, and V . Koivunen, “Bayesian quick- est detection of propagating spatial events,”IEEE Trans- actions on Signal Processing, vol. 70, pp. 5982–5995, 2022.DOI: 10.1109/TSP.2022.3230334

  14. [14]

    Music, maximum likelihood, and cramer-rao bound,

    P. Stoica and A. Nehorai, “Music, maximum likelihood, and cramer-rao bound,”IEEE Transactions on Acous- tics, Speech, and Signal Processing, vol. 37, no. 5, pp. 720–741, 1989, Provides foundational statistical analysis comparing subspace methods to theoretical bounds

  15. [15]

    Sensor array processing based on subspace fitting,

    M. Viberg and B. Ottersten, “Sensor array processing based on subspace fitting,”IEEE Transactions on Signal Processing, vol. 39, no. 5, pp. 1110–1121, 1991, Rigor- 11 ous asymptotic statistical analysis of weighted subspace fitting estimators

  16. [16]

    A sensitivity analysis of the music algorithm,

    B. Friedlander, “A sensitivity analysis of the music algorithm,”IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 10, pp. 1740–1751, 1990, Detailed statistical perturbation analysis of eigen- vectors and eigenvalues for subspace methods

  17. [17]

    Error-energy bounds for adaptive filtering algorithms,

    A. H. Sayed and M. Rupp, “Error-energy bounds for adaptive filtering algorithms,”IEEE Transactions on Signal Processing, vol. 45, no. 9, pp. 2267–2279, 1997, A landmark paper using energy conservation to analyze the statistical stability and performance of adaptive filters

  18. [18]

    Performance advantage of complex lms for controlling narrow-band adaptive arrays,

    L. Horowitz and K. Senne, “Performance advantage of complex lms for controlling narrow-band adaptive arrays,”IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 722–736, 1981, A classic derivation of the second-order moment statistics for gradient-based adaptive algorithms

  19. [19]

    Model-order selection: A review of information criterion rules,

    P. Stoica and Y . Selen, “Model-order selection: A review of information criterion rules,”IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36–47, 2004, Analyzes the statistical consistency of criteria like AIC and BIC in the TSP context

  20. [20]

    Identifiability of the parameters of non-gaussian arma models using cumu- lant statistics,

    A. Swami and J. M. Mendel, “Identifiability of the parameters of non-gaussian arma models using cumu- lant statistics,”IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2450–2460, 1991, Rigorous statis- tical proof of model identifiability using higher-order statistics

  21. [21]

    The cramer-rao bound for estimating a sparse parameter vector,

    Z. Ben-Haim and Y . C. Eldar, “The cramer-rao bound for estimating a sparse parameter vector,”IEEE Trans- actions on Signal Processing, vol. 58, no. 6, pp. 3384– 3389, 2010, Statistical characterization of the funda- mental limits in sparse signal recovery

  22. [22]

    Second-order multidimensional ica: Performance analysis,

    D. Lahat, J.-F. Cardoso, and H. Messer, “Second-order multidimensional ica: Performance analysis,”IEEE Transactions on Signal Processing, vol. 60, no. 9, pp. 4598–4610, 2012, Statistical performance character- ization of blind source separation under various noise conditions

  23. [23]

    On the likelihood ratio for detection of signals in non-gaussian noise,

    B. Picinbono, “On the likelihood ratio for detection of signals in non-gaussian noise,”IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 321–324, 1995

  24. [24]

    D. S. Watson, J. O’Hara, N. Tax, R. Mudd, and I. Guy,Explaining predictive uncertainty with information theoretic shapley values, 2023. arXiv: 2306 . 05724 [stat.ML]. [Online]. Available: https : / / arxiv. org / abs/2306.05724

  25. [25]

    H. V . Poor,An introduction to signal detection and estimation. Springer Science & Business Media, 2013

  26. [26]

    A value for n-person games,

    L. S. Shapley, “A value for n-person games,” inContri- butions to the Theory of Games, Volume II, ser. Annals of Mathematics Studies, H. Kuhn and A. Tucker, Eds., vol. 28, Princeton, NJ: Princeton University Press, 1953, pp. 307–317

  27. [27]

    Deep learning for anomaly detection: A review,

    G. Pang, C. Shen, L. Cao, and A. V . D. Hengel, “Deep learning for anomaly detection: A review,”ACM Comput. Surv., vol. 54, no. 2, Mar. 2021,ISSN: 0360- 0300.DOI: 10.1145/3439950 [Online]. Available: https: //doi.org/10.1145/3439950

  28. [28]

    Deep Learning for Anomaly Detection: A Survey

    R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,”CoRR, vol. abs/1901.03407, 2019. arXiv: 1901 . 03407. [Online]. Available: http://arxiv.org/abs/1901.03407

  29. [29]

    Explainable ai: Using shapley value to explain complex anomaly detection ml-based,

    J. Zou and O. Petrosian, “Explainable ai: Using shapley value to explain complex anomaly detection ml-based,” Artificial Intelligence and Applications, pp. 152–164, 2023.DOI: 10.3233/FAIA200777

  30. [30]

    Using kernel shap xai method to optimize the network anomaly detection model,

    K. Roshan and A. Zafar, “Using kernel shap xai method to optimize the network anomaly detection model,” in 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), 2022, pp. 74–80.DOI: 10 . 23919 / INDIACom54597 . 2022 . 9763241

  31. [31]

    Takeishi and Y

    N. Takeishi and Y . Kawahara,A characteristic function for shapley-value-based attribution of anomaly scores,

  32. [32]

    [Online]

    arXiv: 2004.04464[cs.LG]. [Online]. Avail- able: https://arxiv.org/abs/2004.04464

  33. [33]

    Shapley values of reconstruction errors of pca for explaining anomaly detection,

    N. Takeishi, “Shapley values of reconstruction errors of pca for explaining anomaly detection,” in2019 International Conference on Data Mining Workshops (ICDMW), 2019, pp. 793–798.DOI: 10.1109/ICDMW. 2019.00117

  34. [34]

    Explaining anomalies detected by autoencoders using shapley additive explanations,

    L. Antwarg, R. M. Miller, B. Shapira, and L. Rokach, “Explaining anomalies detected by autoencoders using shapley additive explanations,”Expert Systems with Applications, vol. 186, p. 115 736, 2021,ISSN: 0957- 4174.DOI: https://doi.org/10.1016/j.eswa.2021.115736 [Online]. Available: https : / / www. sciencedirect . com / science/article/pii/S0957417421011155

  35. [35]

    Model inde- pendent feature attributions: Shapley values that un- cover non-linear dependencies,

    D. Fryer, I. Strumke, and H. Nguyen, “Model inde- pendent feature attributions: Shapley values that un- cover non-linear dependencies,”PeerJ Computer Sci- ence, vol. 7, e582, Jun. 2021.DOI: 10.7717/peerj-cs.582

  36. [36]

    Explaining indi- vidual predictions when features are dependent: More accurate approximations to shapley values,

    K. Aas, M. Jullum, and A. Løland, “Explaining indi- vidual predictions when features are dependent: More accurate approximations to shapley values,”Artificial Intelligence, vol. 298, p. 103 502, 2021,ISSN: 0004- 3702.DOI: https://doi.org/10.1016/j.artint.2021.103502 [Online]. Available: https : / / www. sciencedirect . com / science/article/pii/S0004370221000539

  37. [37]

    On shapley value for measur- ing importance of dependent inputs,

    A. Owen and C. Prieur, “On shapley value for measur- ing importance of dependent inputs,”SIAM/ASA Journal on Uncertainty Quantification, vol. 5, Oct. 2016.DOI: 10.1137/16M1097717

  38. [38]

    The many shapley values for model explanation,

    M. Sundararajan and A. Najmi, “The many shapley values for model explanation,” inProceedings of the 37th International Conference on Machine Learning, H. D. III and A. Singh, Eds., ser. Proceedings of Machine Learning Research, vol. 119, PMLR, 13–18 Jul 2020, pp. 9269–9278. [Online]. Available: https : //proceedings.mlr.press/v119/sundararajan20b.html

  39. [39]

    On using the shapley value for anomaly localization: A statistical investigation,

    R. S. Blum, X. Fang, and F. Freytag, “On using the shapley value for anomaly localization: A statistical investigation,”Applied AI Letters, vol. 7, no. 2, e70024, 12 2026, e70024 AAIL-2025-057.DOI: https://doi.org/10. 1002/ail2.70024 eprint: https://onlinelibrary.wiley.com/ doi/pdf/10.1002/ail2.70024. [Online]. Available: https: //onlinelibrary.wiley.com/...

  40. [40]

    Multivariate generalized gaussian distribution: Convexity and graph- ical models,

    T. Zhang, A. Wiesel, and M. S. Greco, “Multivariate generalized gaussian distribution: Convexity and graph- ical models,”IEEE Transactions on Signal Processing, vol. 61, no. 16, pp. 4141–4148, 2013.DOI: 10.1109/ TSP.2013.2267740

  41. [41]

    A mul- tivariate generalization of the power exponential family of distributions,

    E. G ´omez, M. Gomez-Villegas, and J. Mar ´ın, “A mul- tivariate generalization of the power exponential family of distributions,”Communications in Statistics-Theory and Methods, vol. 27, no. 3, pp. 589–600, 1998

  42. [42]

    M. E. Johnson,Multivariate Statistical Simulation. New York, NY: John Wiley & Sons, 1987, ch. 6,ISBN: 978- 0471822875