pith. sign in

arxiv: 2401.11330 · v3 · pith:B6BMDFFLnew · submitted 2024-01-20 · 💻 cs.SI

Identifying the Source of Information Spread in Networks via Markov Chains

Pith reviewed 2026-05-24 04:38 UTC · model grok-4.3

classification 💻 cs.SI
keywords source detectioninformation diffusionMarkov chainindependent cascade modelmaximum likelihood estimationsocial networksnetwork analysis
0
0 comments X

The pith

The source of an information cascade is the node with highest stationary probability in a Markov chain built from observed infections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes an efficient method to solve the source detection problem in networks under the independent cascade model by constructing a Markov chain and computing its stationary distribution instead of directly maximizing the likelihood. The approach sidesteps the computational hardness of exact maximum likelihood estimation over all candidate sources. A sympathetic reader would care because tracing the origin of a diffusion helps understand and contain the spread of information or misinformation in social networks, and an efficient method makes this feasible at scale. Simulations demonstrate that the stationary distribution method performs effectively compared to existing techniques on both random graphs and real-world networks.

Core claim

We propose an efficient method for the source detection problem under the MLE approach, which is based on computing the stationary distribution of a Markov chain. Using simulations, we demonstrate the effectiveness of our method compared to other state-of-the-art methods from the literature, both on random and real-world networks.

What carries the argument

A Markov chain constructed from the network edges and the observed set of infected nodes, whose stationary distribution is used to identify the most likely source under maximum likelihood estimation.

If this is right

  • Source detection runs in time linear in network size rather than requiring exponential search over candidates.
  • The method scales to large real-world networks where brute-force likelihood computation is impossible.
  • It provides a practical alternative that maintains competitive accuracy in simulations on both synthetic and empirical graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same stationary-distribution ranking might apply to other diffusion models beyond independent cascade if similar likelihood structures hold.
  • The construction could extend to settings with multiple sources or noisy observations by adjusting the chain transitions.
  • This creates a direct computational bridge between stationary analysis and likelihood-based inference on graphs.

Load-bearing premise

The stationary distribution of the constructed Markov chain yields or closely approximates the node that maximizes the likelihood of the observed infected set.

What would settle it

On a small network where the exact maximum likelihood source can be computed by enumeration, check whether the node with the highest stationary probability matches the exact MLE source.

Figures

Figures reproduced from arXiv: 2401.11330 by Amos Azaria, Noam Hazon, Yael Sabato.

Figure 1
Figure 1. Figure 1: Graph examples. 6 Conversion Methods Now we detail how to produce the Markov chain graph GM = (SM, EM). Each node vi ∈ A′ is represented by a state si ∈ SM, and each directed edge (vi , vj ) ∈ E(A′ ) is represented by the reversed edge (sj , si) ∈ EM. We get that in the Markov chain, each edge is pointing from a node to all its possible activators. Therefore, a naive approach for converting the social grap… view at source ↗
Figure 2
Figure 2. Figure 2: The Markov chains that are obtained by applying the self loops method on the graphs [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The Markov chain that is obtained by stage (1) of the no-loops method for the social [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The average number of times in which the no-loops and the self-loops methods, using [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The average number of times in which the no-loops and the self-loops methods, using [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

Nowadays, the diffusion of information through social networks is a powerful phenomenon. One common way to model diffusions in social networks is the Independent Cascade (IC) model. Given a set of infected nodes according to the IC model, a natural problem is the source detection problem, in which the goal is to identify the unique node that has started the diffusion. Maximum Likelihood Estimation (MLE) is a common approach for tackling the source detection problem, but it is computationally hard. In this work, we propose an efficient method for the source detection problem under the MLE approach, which is based on computing the stationary distribution of a Markov chain. Using simulations, we demonstrate the effectiveness of our method compared to other state-of-the-art methods from the literature, both on random and real-world networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an efficient method for the source detection problem under the Maximum Likelihood Estimation (MLE) approach in the Independent Cascade (IC) model. The method computes the stationary distribution of a constructed Markov chain to identify the source node, with effectiveness shown via simulations on random and real-world networks outperforming other state-of-the-art methods.

Significance. If the stationary distribution is shown to solve or approximate the MLE objective, the approach would offer a scalable alternative to an otherwise intractable problem, with potential value in network epidemiology and social diffusion studies. The empirical comparisons provide some support, though the lack of analytic grounding reduces overall significance.

major comments (2)
  1. [Abstract] Abstract: The claim that the method is 'based on computing the stationary distribution of a Markov chain' for the MLE approach lacks any derivation, explicit transition matrix definition, or argument establishing that the stationary distribution π_v is proportional to (or monotonic in) the likelihood P(S | source = v) under the IC model. This equivalence is load-bearing for the central claim.
  2. [Simulations] Simulations (throughout results): Validation rests on simulations, yet no details are provided on error bars, number of Monte Carlo runs, exact IC model parameters (e.g., edge probabilities), or precise baselines, making it impossible to evaluate whether the reported superiority is robust or statistically significant.
minor comments (1)
  1. Notation for the Markov chain construction (states, transitions) should be introduced with explicit definitions early in the manuscript to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that the method is 'based on computing the stationary distribution of a Markov chain' for the MLE approach lacks any derivation, explicit transition matrix definition, or argument establishing that the stationary distribution π_v is proportional to (or monotonic in) the likelihood P(S | source = v) under the IC model. This equivalence is load-bearing for the central claim.

    Authors: The body of the manuscript defines the specific Markov chain (with explicit transition probabilities derived from the IC model) and provides the argument that its stationary distribution ranks nodes in a manner consistent with the MLE objective. We agree, however, that the abstract is overly terse and omits this justification. In the revision we will expand the abstract and add a dedicated subsection that explicitly states the transition matrix and the monotonicity argument linking π_v to the likelihood. revision: yes

  2. Referee: [Simulations] Simulations (throughout results): Validation rests on simulations, yet no details are provided on error bars, number of Monte Carlo runs, exact IC model parameters (e.g., edge probabilities), or precise baselines, making it impossible to evaluate whether the reported superiority is robust or statistically significant.

    Authors: We will augment the experimental section with the missing information: the number of Monte Carlo repetitions per setting, standard-error bars on all reported metrics, the precise edge-probability values (or distributions) used for each network, and the exact implementation details and parameter settings of every baseline method. revision: yes

Circularity Check

0 steps flagged

No circularity; Markov chain construction presented as independent algorithmic device

full rationale

The paper proposes an efficient method for source detection under MLE for the IC model by computing the stationary distribution of a Markov chain, with effectiveness demonstrated via simulations on random and real-world networks. No derivation chain is exhibited in the provided text that reduces the claimed result to its own inputs by construction, self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The construction is described as an independent device rather than one whose output is tautologically equivalent to the input likelihoods or fitted values. This is the normal case of a self-contained algorithmic proposal validated empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard Independent Cascade diffusion assumption and on the unstated claim that a Markov-chain stationary distribution solves or approximates the MLE objective; no free parameters or invented entities are visible in the abstract.

axioms (2)
  • domain assumption Observed infections are generated exactly according to the Independent Cascade model
    Explicitly stated as the modeling framework in the abstract.
  • ad hoc to paper The stationary distribution of the constructed Markov chain identifies the MLE source
    This equivalence is the load-bearing step of the proposed method but receives no justification in the abstract.

pith-pipeline@v0.9.0 · 5661 in / 1243 out tokens · 18256 ms · 2026-05-24T04:38:03.101173+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Agaskar, A., and Lu, Y. M. A fast monte carlo algorithm for source localization on graphs. In Wavelets and Sparsity XV (2013), vol. 8858, SPIE, pp. 429–434

  2. [2]

    Contrasting the spread of misinformation in online social networks

    Amoruso, M., Anello, D., Auletta, V., Cerulli, R., Ferraioli, D., and Raiconi, A. Contrasting the spread of misinformation in online social networks. Journal of Artificial Intelligence Research 69 (2020), 847–879

  3. [3]

    Batabyal, A. A. Markov chains: Models, algorithms and applications. Interfaces 36, 6 (2006), 609

  4. [4]

    N., and Myers, E

    Gabow, H. N., and Myers, E. W. Finding all spanning trees of directed and undirected graphs. SIAM Journal on Computing 7 , 3 (1978), 280–287

  5. [5]

    Talk of the network: A complex systems look at the underlying process of word-of-mouth

    Goldenberg, J., Libai, B., and Muller, E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12 , 3 (2001), 211–223. 16

  6. [6]

    Inferring networks of diffusion and influence

    Gomez-Rodriguez, M., Leskovec, J., and Krause, A. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD) 5 , 4 (2012), 1–37

  7. [7]

    Identifying propagation sources in networks: State-of-the-art and comparative studies

    Jiang, J., Wen, S., Yu, S., Xiang, Y., and Zhou, W. Identifying propagation sources in networks: State-of-the-art and comparative studies. IEEE Communications Surveys & Tutorials 19, 1 (2016), 465–481

  8. [8]

    Schemes of propagation models and source estimators for rumor source detection in online social networks: A short survey of a decade of research

    Jin, R., and Wu, W. Schemes of propagation models and source estimators for rumor source detection in online social networks: A short survey of a decade of research. Discrete Mathematics, Algorithms and Applications (2021), 2130002

  9. [9]

    Maximizing the spread of influence through a social network

    Kempe, D., Kleinberg, J., and Tardos, ´E. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (2003), ACM, pp. 137–146

  10. [10]

    Ct-ic: Continuously activated and time-restricted inde- pendent cascade model for viral marketing

    Kim, J., Lee, W., and Yu, H. Ct-ic: Continuously activated and time-restricted inde- pendent cascade model for viral marketing. Knowledge-Based Systems 62 (2014), 57–68

  11. [11]

    S., and Karamchandani, N

    Kumar, A., Borkar, V. S., and Karamchandani, N. Temporally agnostic rumor- source detection. IEEE Transactions on Signal and Information Processing over Networks 3, 2 (2017), 316–329

  12. [12]

    Finding effectors in social networks

    Lappas, T., Terzi, E., Gunopulos, D., and Mannila, H. Finding effectors in social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (2010), pp. 1059–1068

  13. [13]

    Estimating a probability using finite memory

    Leighton, F., and Rivest, R. Estimating a probability using finite memory. IEEE Transactions on Information Theory 32 , 6 (1986), 733–742

  14. [14]

    A., and Peres, Y

    Levin, D. A., and Peres, Y. Markov chains and mixing times , vol. 107. American Mathematical Soc., 2017

  15. [15]

    Myung, I. J. Tutorial on maximum likelihood estimation. Journal of mathematical Psy- chology 47, 1 (2003), 90–100

  16. [16]

    Detecting sources of computer viruses in networks: theory and experiment

    Shah, D., and Zaman, T. Detecting sources of computer viruses in networks: theory and experiment. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems (2010), pp. 203–214

  17. [17]

    Rumors in a network: Who’s the culprit? IEEE Transactions on information theory 57 , 8 (2011), 5163–5181

    Shah, D., and Zaman, T. Rumors in a network: Who’s the culprit? IEEE Transactions on information theory 57 , 8 (2011), 5163–5181

  18. [18]

    A strong-connectivity algorithm and its applications in data flow analysis

    Sharir, M. A strong-connectivity algorithm and its applications in data flow analysis. Computers & Mathematics with Applications 7 , 1 (1981), 67–72

  19. [19]

    Source detection of rumor in social network–a review

    Shelke, S., and Attar, V. Source detection of rumor in social network–a review. Online Social Networks and Media 9 (2019), 30–42

  20. [20]

    A., Li, S., Wu, W., and Du, D.-Z

    Tong, G. A., Li, S., Wu, W., and Du, D.-Z. Effector detection in social networks. IEEE Transactions on Computational Social Systems 3 , 4 (2016), 151–163

  21. [21]

    Multiple source detection without knowing the underlying propagation model

    Wang, Z., Wang, C., Pei, J., and Ye, X. Multiple source detection without knowing the underlying propagation model. In Proceedings of the AAAI Conference on Artificial Intelligence (2017), vol. 31. 17

  22. [22]

    Cascade source inference in networks: a markov chain monte carlo approach

    Zhai, X., Wu, W., and Xu, W. Cascade source inference in networks: a markov chain monte carlo approach. Computational social networks 2 , 1 (2015), 1–17

  23. [23]

    A markov chain monte carlo approach for source detection in networks

    Zhang, L., Jin, T., Xu, T., Chang, B., Wang, Z., and Chen, E. A markov chain monte carlo approach for source detection in networks. In Chinese National Conference on Social Media Processing (2017), Springer, pp. 77–88

  24. [24]

    Catch’em all: Locating multiple diffusion sources in networks with partial observations

    Zhu, K., Chen, Z., and Ying, L. Catch’em all: Locating multiple diffusion sources in networks with partial observations. In Proceedings of the AAAI Conference on Artificial Intelligence (2017), vol. 31. 18