pith. sign in

arxiv: 2605.28086 · v1 · pith:UA3N4BFPnew · submitted 2026-05-27 · 💻 cs.SI · cs.DB

Efficient Shapley-Based Influence Attribution in Social Networks

Pith reviewed 2026-06-29 09:41 UTC · model grok-4.3

classification 💻 cs.SI cs.DB
keywords influence attributionShapley valuessocial networksdiffusion modelsex-ante attributionindependent cascadeseed selectioncooperative game theory
0
0 comments X

The pith

Shapley values allow fair ex-ante attribution of each seed's marginal contribution to expected influence spread in a social network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to assign credit to seed users for their role in spreading information before the diffusion process even starts. This is done by adapting Shapley values from cooperative game theory to measure each seed's average marginal impact across all possible subsets of seeds. Such attribution matters for deciding how to pay influencers or allocate budgets without needing to observe actual cascades after the fact. The approach handles the stochastic nature of diffusion by focusing on expected values under a known model like the independent cascade model. For simple cases like single-step activation, exact polynomial-time computation is possible, while longer propagations require approximations with guarantees.

Core claim

Adapting Shapley values to the influence propagation game yields a principled way to compute each seed's expected marginal contribution to the spread under a known diffusion model, with polynomial-time algorithms for single-step activation, #P-hardness for multi-step cases, and approximation algorithms for the independent cascade model.

What carries the argument

Shapley value computation in the cooperative game defined by seeds as players and the value function as the expected number of activated nodes under the diffusion model.

If this is right

  • Budget allocation among potential influencers can be done fairly based on their expected contributions prior to any campaign launch.
  • Credit distribution for influence can occur without sharing or observing private cascade data from actual runs.
  • Algorithms exist that run in polynomial time when activation happens in one step, providing exact values.
  • Approximation schemes with provable guarantees apply to standard independent cascade diffusion on general networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the underlying diffusion probabilities are accurately estimated from data, the attributions should correlate with observed contributions in repeated simulations.
  • This framework could extend to other stochastic processes where marginal contributions need to be attributed fairly without full observation.
  • Combining with influence maximization techniques might allow selecting seeds while also planning their compensation in one optimization step.

Load-bearing premise

The exact probabilities governing how influence spreads along edges are known in advance and can be used to calculate expected spreads.

What would settle it

Simulate many diffusions from different seed subsets and check whether the average marginal contributions match the computed Shapley values; a systematic mismatch would indicate the method does not capture true marginal impacts.

Figures

Figures reproduced from arXiv: 2605.28086 by Amir Gilad, Fangzhu Shen, Sudeepa Roy.

Figure 1
Figure 1. Figure 1: (a) shows a toy network of the IC model for Exam [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Runtime for Single-step termination on Facebook [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Shapley values for top-50 seeds selected by greedy [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Relative error vs. # seeds for K-steps termination 1K 5K 10K 50K 100K Log Number of nodes 1 2 3 4 5 6 Relative Error (%) (a) Varying number of nodes 5 25 50 75 100 Average degree 1 2 3 4 5 6 7 Relative Error (%) (b) Varying average degree ApproxPermuteMC ApproxLiveEdgeGraph ApproxRRset [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Relative error vs. network structure for [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: a shows increasing K yields diminishing accuracy returns, as most activations occur in early steps (K ≤ 16). While both al￾gorithms converge similarly, ApproxLiveEdge’s error increases while ApproxRRset’s decreases as K grows. This occurs because ApproxLiveEdge’s forward graph traversal incurs higher variance with longer propagation paths, while ApproxRRset’s backward graph traversal benefits from deeper p… view at source ↗
read the original abstract

The ubiquity of social platforms has reshaped the way information, behaviors, and advertisements diffuse across networks, with influence propagation often initiated by a small set of ``seed'' users. While much of the literature emphasizes optimizing seed selection to maximize spread, a critical yet underexplored question remains: how to fairly estimate the contributions of individual seeds ``ex-ante'', i.e., before the diffusion process occurs? This capability is essential for budget allocation, influencer pricing, and fair, privacy-preserving credit distribution under uncertainty, without relying on ex-post cascade logs that capture only a single execution of influence propagation. We introduce a framework for ex-ante influence attribution based on Shapley values from cooperative game theory, which capture each seed's marginal impact in a principled and equitable manner. Adapting Shapley values to influence propagation raises unique computational challenges due to the stochastic nature of diffusion and the intricate dependencies across network structures. To address these challenges, we design polynomial-time algorithms for the special case of single-step activation that is of independent practical interest, establish a sharp tractability boundary by proving $\#P$-hardness for any propagation beyond one step, and develop approximation algorithms with provable guarantees for the standard IC model as well as time-bounded variants. Empirical evaluation on real-world and synthetic networks demonstrates that our methods are both efficient and effective, offering a practical mechanism for ex-ante influence attribution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper introduces a framework for ex-ante influence attribution in social networks based on Shapley values from cooperative game theory applied to an expected marginal-contribution game under a known diffusion model (e.g., Independent Cascade). It claims polynomial-time exact algorithms for the single-step activation case, a sharp #P-hardness result for any multi-step propagation, approximation algorithms with provable guarantees for the general IC model and time-bounded variants, and empirical validation showing efficiency and effectiveness on real-world and synthetic networks.

Significance. If the algorithmic claims and hardness boundary hold, the work supplies a principled, equitable method for attributing influence before any diffusion occurs. This directly supports applications in budget allocation, influencer pricing, and privacy-preserving credit assignment without requiring observed cascades. The explicit statement that edge probabilities are known a priori removes the circularity risk noted in the stress-test; the tractability boundary and approximation guarantees would be notable contributions to the influence maximization literature.

minor comments (1)
  1. [Abstract] Abstract: the phrase 'approximation algorithms with provable guarantees' would be more informative if the specific ratios or conditions (e.g., for the IC model) were stated explicitly rather than left at a high level.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive review and the recommendation to accept. The summary accurately captures the paper's contributions on ex-ante Shapley-based influence attribution, the polynomial algorithms for single-step activation, the #P-hardness result, and the approximation algorithms for the IC model.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper applies the standard definition of Shapley values to a cooperative game whose characteristic function is the expected influence spread under a known diffusion model (IC or similar). This is a direct, non-reductive use of the marginal-contribution definition on an externally specified game; the algorithmic results (poly-time for single-step activation, #P-hardness for multi-step, and approximation algorithms) are derived from network and stochastic-process properties rather than from any fitted parameter, self-referential equation, or self-citation chain. No load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework relies on the standard cooperative-game definition of Shapley values and on the independent-cascade diffusion model; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (2)
  • standard math Shapley value is the unique fair allocation satisfying the standard axioms of cooperative games
    Invoked when defining the attribution as marginal contribution averaged over coalitions.
  • domain assumption The diffusion model (edge probabilities) is known and fixed before attribution
    Required to compute expected marginal contributions without observed cascades.

pith-pipeline@v0.9.1-grok · 5778 in / 1259 out tokens · 19823 ms · 2026-06-29T09:41:46.912425+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 3 canonical work pages

  1. [1]

    Selahattin Akkas and Ariful Azad. 2024. Gnnshap: Scalable and accurate gnn explanation using shapley values. InProceedings of the ACM Web Conference 2024. 827–838

  2. [2]

    Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021. The tractability of SHAP-score-based explanations for classification over deterministic and decomposable boolean circuits. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 6670–6678

  3. [3]

    Roland Bacher. 2002. Determinants of matrices related to the Pascal triangle. Journal de théorie des nombres de Bordeaux14, 1 (2002), 19–41

  4. [4]

    Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. InProceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 44–54

  5. [5]

    Phillip Bonacich. 1972. Factoring and weighting approaches to status scores and clique identification.Journal of mathematical sociology2, 1 (1972), 113–120

  6. [6]

    Christian Borgs, Michael Brautbar, Jennifer Chayes, and Brendan Lucier. 2014. Maximizing social influence in nearly optimal time. InProceedings of the twenty- fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 946–957

  7. [7]

    Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems30, 1-7 (1998), 107–117

  8. [8]

    Mark Alexander Burgess and Archie C Chapman. 2021. Approximating the Shapley Value Using Stratified Empirical Bernstein Sampling.. InIJCAI. 73–81

  9. [9]

    Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling.Computers & Operations Research36, 5 (2009), 1726–1730

  10. [10]

    Wei Chen and Shang-Hua Teng. 2017. Interplay between social influence and network centrality: A comparative study on shapley centrality and single-node- influence centrality. InProceedings of the 26th international conference on world wide web. 967–976

  11. [11]

    Wei Chen, Chi Wang, and Yajun Wang. 2010. Scalable influence maximization for prevalent viral marketing in large-scale social networks. InProceedings of KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Shen et al. the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 1029–1038

  12. [12]

    Xiaolong Chen, Yifan Song, and Jing Tang. 2024. Link recommendation to augment influence diffusion with provable guarantees. InProceedings of the ACM Web Conference 2024. 2509–2518

  13. [13]

    Fan Chung and Linyuan Lu. 2006. Concentration inequalities and martingale inequalities: a survey.Internet mathematics3, 1 (2006), 79–127

  14. [14]

    dblp Team. 2025. dblp computer science bibliography – Monthly Snapshot XML Release of April 2025. doi:10.4230/dblp.xml.2025-04-01

  15. [15]

    Erik D Demaine, MohammadTaghi Hajiaghayi, Hamid Mahini, David L Malec, S Raghavan, Anshul Sawant, and Morteza Zadimoghadam. 2014. How to influence people with partial incentives. InProceedings of the 23rd international conference on World wide web. 937–948

  16. [16]

    Daniel Deutch, Nave Frost, Amir Gilad, and Oren Sheffer. 2021. Explanations for data repair through shapley values. InProceedings of the 30th ACM International Conference on Information & Knowledge Management. 362–371

  17. [17]

    Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaël Monet. 2022. Comput- ing the Shapley Value of Facts in Query Answering. InProceedings of the 2022 International Conference on Management of Data. 1570–1583

  18. [18]

    Pedro Domingos and Matt Richardson. 2001. Mining the network value of customers. InProceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 57–66

  19. [19]

    P ERDdS and A R&wi. 1959. On random graphs I.Publ. math. debrecen6, 290-297 (1959), 18

  20. [20]

    Christian G Fink, Kelly Fullin, Guillermo Gutierrez, Nathan Omodt, Sydney Zinnecker, Gina Sprint, and Sean McCulloch. 2023. A centrality measure for quantifying spread on weighted, directed networks.Physica A: Statistical Me- chanics and its Applications626 (2023), 129083

  21. [21]

    Christian G Fink, Nathan Omodt, Sydney Zinnecker, and Gina Sprint. 2023. A Congressional Twitter network dataset quantifying pairwise probability of influence.Data in Brief50 (2023), 109521

  22. [22]

    Linton C Freeman. 1978. Centrality in social networks conceptual clarification. Social networks1, 3 (1978), 215–239

  23. [23]

    Daniel Fryer, Inga Strümke, and Hien Nguyen. 2021. Shapley values for feature selection: The good, the bad, and the axioms.Ieee Access9 (2021), 144352–144360

  24. [24]

    Noemi Gasko, Tamás Képes, Rodica Ioana Lung, and Mihai Suciu. 2023. Iden- tification of influential nodes with shapley influence maximization extremal optimization algorithm.Applied Soft Computing146 (2023), 110653

  25. [25]

    Amirata Ghorbani and James Zou. 2019. Data shapley: Equitable valuation of data for machine learning. InInternational conference on machine learning. PMLR, 2242–2251

  26. [26]

    Rumi Ghosh, Shang-hua Teng, Kristina Lerman, and Xiaoran Yan. 2014. The interplay between dynamics and networks: centrality, communities, and cheeger inequality. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1406–1415

  27. [27]

    Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables.Journal of the American statistical association58, 301 (1963), 13–30

  28. [28]

    Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, and Costas J Spanos. 2019. Towards efficient data valuation based on the shapley value. InThe 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1167–1176

  29. [29]

    Ahmet Kara, Dan Olteanu, and Dan Suciu. 2024. From Shapley Value to Model Counting and Back.Proceedings of the ACM on Management of Data2, 2 (2024), 1–23

  30. [30]

    David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. InProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 137–146

  31. [31]

    Jinha Kim, Seung-Keol Kim, and Hwanjo Yu. 2013. Scalable and parallelizable processing of influence maximization for large-scale social networks?. In2013 IEEE 29th international conference on data engineering (ICDE). IEEE, 266–277

  32. [32]

    Masahiro Kimura and Kazumi Saito. 2006. Tractable models for information diffusion in social networks. InEuropean conference on principles of data mining and knowledge discovery. Springer, 259–271

  33. [33]

    Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne Van- Briesen, and Natalie Glance. 2007. Cost-effective outbreak detection in networks. InProceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 420–429

  34. [34]

    Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters.Internet Mathematics6, 1 (2009), 29–123

  35. [35]

    Jure Leskovec and Julian Mcauley. 2012. Learning to discover social circles in ego networks.Advances in neural information processing systems25 (2012)

  36. [36]

    Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021. The Shapley value of tuples in query answering.Logical Methods in Computer Science 17 (2021)

  37. [37]

    Ester Livshits and Benny Kimelfeld. 2022. The Shapley value of inconsistency measures for functional dependencies.Logical Methods in Computer Science18 (2022)

  38. [38]

    Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions.Advances in neural information processing systems30 (2017)

  39. [39]

    Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravin- dran, and Nicholas R Jennings. 2013. Efficient computation of the Shapley value for game-theoretic network centrality.Journal of Artificial Intelligence Research 46 (2013), 607–650

  40. [40]

    Rory Mitchell, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. 2022. Sampling permutations for shapley value estimation.Journal of Machine Learning Research 23, 43 (2022), 1–46

  41. [41]

    Ramasuri Narayanam and Yadati Narahari. 2011. A Shapley Value-Based Ap- proach to Discover Influential Nodes in Social Networks.IEEE Transactions on Automation Science and Engineering8, 1 (2011), 130–147. doi:10.1109/TASE.2010. 2052042

  42. [42]

    Junyuan Pang, Jian Pei, Haocheng Xia, Xiang Li, and Jinfei Liu. 2025. Shap- ley Value Estimation based on Differential Matrix.Proceedings of the ACM on Management of Data3, 1 (2025), 1–28

  43. [43]

    Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The impact of negation on the complexity of the Shapley value in conjunctive queries. InProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 285–297

  44. [44]

    Lloyd S Shapley et al. 1953. A value for n-person games. (1953)

  45. [45]

    Raghav Singal, Omar Besbes, Antoine Desir, Vineet Goyal, and Garud Iyengar

  46. [46]

    InThe world wide web conference

    Shapley meets uniform: An axiomatic framework for attribution in online advertising. InThe world wide web conference. 1713–1723

  47. [47]

    Youze Tang, Yanchen Shi, and Xiaokui Xiao. 2015. Influence maximization in near-linear time: A martingale approach. InProceedings of the 2015 ACM SIGMOD international conference on management of data. 1539–1554

  48. [48]

    Youze Tang, Xiaokui Xiao, and Yanchen Shi. 2014. Influence maximization: Near- optimal time complexity meets practical efficiency. InProceedings of the 2014 ACM SIGMOD international conference on Management of data. 75–86

  49. [49]

    Leslie G. Valiant. 1979. The Complexity of Enumeration and Reliability Problems. SIAM J. Comput.8 (Aug. 1979), 410–421. doi:10.1137/0208032

  50. [50]

    Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2022. On the tractability of SHAP explanations.Journal of Artificial Intelligence Research 74 (2022), 851–886

  51. [51]

    Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network commu- nities based on ground-truth. InProceedings of the ACM SIGKDD workshop on mining data semantics. 1–8

  52. [52]

    Jiayao Zhang, Qiheng Sun, Jinfei Liu, Li Xiong, Jian Pei, and Kui Ren. 2023. Efficient sampling approaches to shapley value approximation.Proceedings of the ACM on Management of Data1, 1 (2023), 1–24

  53. [53]

    Yuqing Zhu, Jing Tang, Xueyan Tang, and Lei Chen. 2021. Analysis of influence contribution in social advertising.Proceedings of the VLDB Endowment15, 2 (2021), 348–360. A Details From Section 3 Shapley Values.Given a set of players 𝑁 and a value (or utility) function 𝑈 : 2𝑁 →R with 𝑈 (∅) = 0, where 𝑈 (𝑆)denotes the value raised by the subset 𝑆 of players,...

  54. [54]

    𝜽 ′∑︁ 𝑗=1 𝜔(𝑹 (1) 𝑗 ) # −E[𝜽 ′]·𝐸𝑃𝑇= 0(159) KDD ’26, August 09–13, 2026, Jeju Island, Republic of Korea Shen et al. Rearranging: E

    but with adjustment to our problem setting. Lemma 12.For any 𝑆⊆𝑇 and any 𝑣∈𝑉\𝑇 , denote 𝑹𝑣 as an RR set for 𝑣 in 𝐺 ′. The probability that 𝑹𝑣 intersects 𝑆 equals to the probability that 𝑆 activates 𝑣 in an influence propagation process on 𝐺: Pr[𝑆∩𝑹 𝑣 ̸=∅] = Pr[𝑆activates𝑣in𝐺](97) Proof. Let 𝑔 be a live-edge graph obtained by removing each edge 𝑒 with1 −𝑝 ...