Identifying the Source of Information Spread in Networks via Markov Chains

Amos Azaria; Noam Hazon; Yael Sabato

arxiv: 2401.11330 · v3 · pith:B6BMDFFLnew · submitted 2024-01-20 · 💻 cs.SI

Identifying the Source of Information Spread in Networks via Markov Chains

Yael Sabato , Amos Azaria , Noam Hazon This is my paper

Pith reviewed 2026-05-24 04:38 UTC · model grok-4.3

classification 💻 cs.SI

keywords source detectioninformation diffusionMarkov chainindependent cascade modelmaximum likelihood estimationsocial networksnetwork analysis

0 comments

The pith

The source of an information cascade is the node with highest stationary probability in a Markov chain built from observed infections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes an efficient method to solve the source detection problem in networks under the independent cascade model by constructing a Markov chain and computing its stationary distribution instead of directly maximizing the likelihood. The approach sidesteps the computational hardness of exact maximum likelihood estimation over all candidate sources. A sympathetic reader would care because tracing the origin of a diffusion helps understand and contain the spread of information or misinformation in social networks, and an efficient method makes this feasible at scale. Simulations demonstrate that the stationary distribution method performs effectively compared to existing techniques on both random graphs and real-world networks.

Core claim

We propose an efficient method for the source detection problem under the MLE approach, which is based on computing the stationary distribution of a Markov chain. Using simulations, we demonstrate the effectiveness of our method compared to other state-of-the-art methods from the literature, both on random and real-world networks.

What carries the argument

A Markov chain constructed from the network edges and the observed set of infected nodes, whose stationary distribution is used to identify the most likely source under maximum likelihood estimation.

If this is right

Source detection runs in time linear in network size rather than requiring exponential search over candidates.
The method scales to large real-world networks where brute-force likelihood computation is impossible.
It provides a practical alternative that maintains competitive accuracy in simulations on both synthetic and empirical graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same stationary-distribution ranking might apply to other diffusion models beyond independent cascade if similar likelihood structures hold.
The construction could extend to settings with multiple sources or noisy observations by adjusting the chain transitions.
This creates a direct computational bridge between stationary analysis and likelihood-based inference on graphs.

Load-bearing premise

The stationary distribution of the constructed Markov chain yields or closely approximates the node that maximizes the likelihood of the observed infected set.

What would settle it

On a small network where the exact maximum likelihood source can be computed by enumeration, check whether the node with the highest stationary probability matches the exact MLE source.

Figures

Figures reproduced from arXiv: 2401.11330 by Amos Azaria, Noam Hazon, Yael Sabato.

**Figure 1.** Figure 1: Graph examples. 6 Conversion Methods Now we detail how to produce the Markov chain graph GM = (SM, EM). Each node vi ∈ A′ is represented by a state si ∈ SM, and each directed edge (vi , vj ) ∈ E(A′ ) is represented by the reversed edge (sj , si) ∈ EM. We get that in the Markov chain, each edge is pointing from a node to all its possible activators. Therefore, a naive approach for converting the social grap… view at source ↗

**Figure 2.** Figure 2: The Markov chains that are obtained by applying the self loops method on the graphs [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: The Markov chain that is obtained by stage (1) of the no-loops method for the social [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: The average number of times in which the no-loops and the self-loops methods, using [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: The average number of times in which the no-loops and the self-loops methods, using [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

Nowadays, the diffusion of information through social networks is a powerful phenomenon. One common way to model diffusions in social networks is the Independent Cascade (IC) model. Given a set of infected nodes according to the IC model, a natural problem is the source detection problem, in which the goal is to identify the unique node that has started the diffusion. Maximum Likelihood Estimation (MLE) is a common approach for tackling the source detection problem, but it is computationally hard. In this work, we propose an efficient method for the source detection problem under the MLE approach, which is based on computing the stationary distribution of a Markov chain. Using simulations, we demonstrate the effectiveness of our method compared to other state-of-the-art methods from the literature, both on random and real-world networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces MLE source detection to a Markov chain stationary distribution but supplies no derivation showing why that equals or approximates the likelihood under the IC model.

read the letter

The core claim is an efficient method for maximum-likelihood source detection that works by computing the stationary distribution of a constructed Markov chain. The abstract presents this as a practical fix for the hardness of direct MLE, and the simulations reportedly beat existing baselines on both random graphs and real networks. That reduction looks new relative to the cited literature, and turning a hard inference task into a standard linear-algebra step is the kind of move that can matter for applied work in epidemiology or misinformation tracking. Credit for trying to make the computation tractable without abandoning the MLE objective. The simulations are the only evidence offered, and the abstract says they demonstrate effectiveness, so the method at least passes a basic empirical check in the authors' hands. The main gap is exactly the one the stress-test flags: nothing in the provided text derives or even sketches why the stationary probabilities rank nodes by P(S | source = v). Without the explicit transition matrix or an argument that π_v is monotonic in the likelihood, it is impossible to know whether the procedure targets the stated goal, approximates it under some regime, or solves something else. Simulation details are also thin—no mention of parameter ranges, number of trials, variance, or precise baseline implementations—so the performance edge is hard to evaluate. The paper is aimed at people who need fast source-finding heuristics on moderate-sized networks. A reader already working on diffusion inference might pick up the idea and test the missing link themselves. It is coherent enough on its own terms to deserve referee time; the derivation and experimental reporting are the obvious items for revision.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an efficient method for the source detection problem under the Maximum Likelihood Estimation (MLE) approach in the Independent Cascade (IC) model. The method computes the stationary distribution of a constructed Markov chain to identify the source node, with effectiveness shown via simulations on random and real-world networks outperforming other state-of-the-art methods.

Significance. If the stationary distribution is shown to solve or approximate the MLE objective, the approach would offer a scalable alternative to an otherwise intractable problem, with potential value in network epidemiology and social diffusion studies. The empirical comparisons provide some support, though the lack of analytic grounding reduces overall significance.

major comments (2)

[Abstract] Abstract: The claim that the method is 'based on computing the stationary distribution of a Markov chain' for the MLE approach lacks any derivation, explicit transition matrix definition, or argument establishing that the stationary distribution π_v is proportional to (or monotonic in) the likelihood P(S | source = v) under the IC model. This equivalence is load-bearing for the central claim.
[Simulations] Simulations (throughout results): Validation rests on simulations, yet no details are provided on error bars, number of Monte Carlo runs, exact IC model parameters (e.g., edge probabilities), or precise baselines, making it impossible to evaluate whether the reported superiority is robust or statistically significant.

minor comments (1)

Notation for the Markov chain construction (states, transitions) should be introduced with explicit definitions early in the manuscript to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and details.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the method is 'based on computing the stationary distribution of a Markov chain' for the MLE approach lacks any derivation, explicit transition matrix definition, or argument establishing that the stationary distribution π_v is proportional to (or monotonic in) the likelihood P(S | source = v) under the IC model. This equivalence is load-bearing for the central claim.

Authors: The body of the manuscript defines the specific Markov chain (with explicit transition probabilities derived from the IC model) and provides the argument that its stationary distribution ranks nodes in a manner consistent with the MLE objective. We agree, however, that the abstract is overly terse and omits this justification. In the revision we will expand the abstract and add a dedicated subsection that explicitly states the transition matrix and the monotonicity argument linking π_v to the likelihood. revision: yes
Referee: [Simulations] Simulations (throughout results): Validation rests on simulations, yet no details are provided on error bars, number of Monte Carlo runs, exact IC model parameters (e.g., edge probabilities), or precise baselines, making it impossible to evaluate whether the reported superiority is robust or statistically significant.

Authors: We will augment the experimental section with the missing information: the number of Monte Carlo repetitions per setting, standard-error bars on all reported metrics, the precise edge-probability values (or distributions) used for each network, and the exact implementation details and parameter settings of every baseline method. revision: yes

Circularity Check

0 steps flagged

No circularity; Markov chain construction presented as independent algorithmic device

full rationale

The paper proposes an efficient method for source detection under MLE for the IC model by computing the stationary distribution of a Markov chain, with effectiveness demonstrated via simulations on random and real-world networks. No derivation chain is exhibited in the provided text that reduces the claimed result to its own inputs by construction, self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The construction is described as an independent device rather than one whose output is tautologically equivalent to the input likelihoods or fitted values. This is the normal case of a self-contained algorithmic proposal validated empirically.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard Independent Cascade diffusion assumption and on the unstated claim that a Markov-chain stationary distribution solves or approximates the MLE objective; no free parameters or invented entities are visible in the abstract.

axioms (2)

domain assumption Observed infections are generated exactly according to the Independent Cascade model
Explicitly stated as the modeling framework in the abstract.
ad hoc to paper The stationary distribution of the constructed Markov chain identifies the MLE source
This equivalence is the load-bearing step of the proposed method but receives no justification in the abstract.

pith-pipeline@v0.9.0 · 5661 in / 1243 out tokens · 18256 ms · 2026-05-24T04:38:03.101173+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose an efficient method for the source detection problem under the MLE approach, which is based on computing the stationary distribution of a Markov chain... Markov chain tree theorem [13], allows us to compute the sum of the weights of all spanning trees rooted at each user in polynomial time
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2. (Self-loops method) ... Ψi = α·Γi ... Theorem 3. (No-loops method) ... Πcorri = α · Γi

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Agaskar, A., and Lu, Y. M. A fast monte carlo algorithm for source localization on graphs. In Wavelets and Sparsity XV (2013), vol. 8858, SPIE, pp. 429–434

work page 2013
[2]

Contrasting the spread of misinformation in online social networks

Amoruso, M., Anello, D., Auletta, V., Cerulli, R., Ferraioli, D., and Raiconi, A. Contrasting the spread of misinformation in online social networks. Journal of Artificial Intelligence Research 69 (2020), 847–879

work page 2020
[3]

Batabyal, A. A. Markov chains: Models, algorithms and applications. Interfaces 36, 6 (2006), 609

work page 2006
[4]

N., and Myers, E

Gabow, H. N., and Myers, E. W. Finding all spanning trees of directed and undirected graphs. SIAM Journal on Computing 7 , 3 (1978), 280–287

work page 1978
[5]

Talk of the network: A complex systems look at the underlying process of word-of-mouth

Goldenberg, J., Libai, B., and Muller, E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12 , 3 (2001), 211–223. 16

work page 2001
[6]

Inferring networks of diffusion and influence

Gomez-Rodriguez, M., Leskovec, J., and Krause, A. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD) 5 , 4 (2012), 1–37

work page 2012
[7]

Identifying propagation sources in networks: State-of-the-art and comparative studies

Jiang, J., Wen, S., Yu, S., Xiang, Y., and Zhou, W. Identifying propagation sources in networks: State-of-the-art and comparative studies. IEEE Communications Surveys & Tutorials 19, 1 (2016), 465–481

work page 2016
[8]

Schemes of propagation models and source estimators for rumor source detection in online social networks: A short survey of a decade of research

Jin, R., and Wu, W. Schemes of propagation models and source estimators for rumor source detection in online social networks: A short survey of a decade of research. Discrete Mathematics, Algorithms and Applications (2021), 2130002

work page 2021
[9]

Maximizing the spread of influence through a social network

Kempe, D., Kleinberg, J., and Tardos, ´E. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (2003), ACM, pp. 137–146

work page 2003
[10]

Ct-ic: Continuously activated and time-restricted inde- pendent cascade model for viral marketing

Kim, J., Lee, W., and Yu, H. Ct-ic: Continuously activated and time-restricted inde- pendent cascade model for viral marketing. Knowledge-Based Systems 62 (2014), 57–68

work page 2014
[11]

S., and Karamchandani, N

Kumar, A., Borkar, V. S., and Karamchandani, N. Temporally agnostic rumor- source detection. IEEE Transactions on Signal and Information Processing over Networks 3, 2 (2017), 316–329

work page 2017
[12]

Finding effectors in social networks

Lappas, T., Terzi, E., Gunopulos, D., and Mannila, H. Finding effectors in social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (2010), pp. 1059–1068

work page 2010
[13]

Estimating a probability using finite memory

Leighton, F., and Rivest, R. Estimating a probability using finite memory. IEEE Transactions on Information Theory 32 , 6 (1986), 733–742

work page 1986
[14]

A., and Peres, Y

Levin, D. A., and Peres, Y. Markov chains and mixing times , vol. 107. American Mathematical Soc., 2017

work page 2017
[15]

Myung, I. J. Tutorial on maximum likelihood estimation. Journal of mathematical Psy- chology 47, 1 (2003), 90–100

work page 2003
[16]

Detecting sources of computer viruses in networks: theory and experiment

Shah, D., and Zaman, T. Detecting sources of computer viruses in networks: theory and experiment. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems (2010), pp. 203–214

work page 2010
[17]

Rumors in a network: Who’s the culprit? IEEE Transactions on information theory 57 , 8 (2011), 5163–5181

Shah, D., and Zaman, T. Rumors in a network: Who’s the culprit? IEEE Transactions on information theory 57 , 8 (2011), 5163–5181

work page 2011
[18]

A strong-connectivity algorithm and its applications in data flow analysis

Sharir, M. A strong-connectivity algorithm and its applications in data flow analysis. Computers & Mathematics with Applications 7 , 1 (1981), 67–72

work page 1981
[19]

Source detection of rumor in social network–a review

Shelke, S., and Attar, V. Source detection of rumor in social network–a review. Online Social Networks and Media 9 (2019), 30–42

work page 2019
[20]

A., Li, S., Wu, W., and Du, D.-Z

Tong, G. A., Li, S., Wu, W., and Du, D.-Z. Effector detection in social networks. IEEE Transactions on Computational Social Systems 3 , 4 (2016), 151–163

work page 2016
[21]

Multiple source detection without knowing the underlying propagation model

Wang, Z., Wang, C., Pei, J., and Ye, X. Multiple source detection without knowing the underlying propagation model. In Proceedings of the AAAI Conference on Artificial Intelligence (2017), vol. 31. 17

work page 2017
[22]

Cascade source inference in networks: a markov chain monte carlo approach

Zhai, X., Wu, W., and Xu, W. Cascade source inference in networks: a markov chain monte carlo approach. Computational social networks 2 , 1 (2015), 1–17

work page 2015
[23]

A markov chain monte carlo approach for source detection in networks

Zhang, L., Jin, T., Xu, T., Chang, B., Wang, Z., and Chen, E. A markov chain monte carlo approach for source detection in networks. In Chinese National Conference on Social Media Processing (2017), Springer, pp. 77–88

work page 2017
[24]

Catch’em all: Locating multiple diffusion sources in networks with partial observations

Zhu, K., Chen, Z., and Ying, L. Catch’em all: Locating multiple diffusion sources in networks with partial observations. In Proceedings of the AAAI Conference on Artificial Intelligence (2017), vol. 31. 18

work page 2017

[1] [1]

Agaskar, A., and Lu, Y. M. A fast monte carlo algorithm for source localization on graphs. In Wavelets and Sparsity XV (2013), vol. 8858, SPIE, pp. 429–434

work page 2013

[2] [2]

Contrasting the spread of misinformation in online social networks

Amoruso, M., Anello, D., Auletta, V., Cerulli, R., Ferraioli, D., and Raiconi, A. Contrasting the spread of misinformation in online social networks. Journal of Artificial Intelligence Research 69 (2020), 847–879

work page 2020

[3] [3]

Batabyal, A. A. Markov chains: Models, algorithms and applications. Interfaces 36, 6 (2006), 609

work page 2006

[4] [4]

N., and Myers, E

Gabow, H. N., and Myers, E. W. Finding all spanning trees of directed and undirected graphs. SIAM Journal on Computing 7 , 3 (1978), 280–287

work page 1978

[5] [5]

Talk of the network: A complex systems look at the underlying process of word-of-mouth

Goldenberg, J., Libai, B., and Muller, E. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12 , 3 (2001), 211–223. 16

work page 2001

[6] [6]

Inferring networks of diffusion and influence

Gomez-Rodriguez, M., Leskovec, J., and Krause, A. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data (TKDD) 5 , 4 (2012), 1–37

work page 2012

[7] [7]

Identifying propagation sources in networks: State-of-the-art and comparative studies

Jiang, J., Wen, S., Yu, S., Xiang, Y., and Zhou, W. Identifying propagation sources in networks: State-of-the-art and comparative studies. IEEE Communications Surveys & Tutorials 19, 1 (2016), 465–481

work page 2016

[8] [8]

Schemes of propagation models and source estimators for rumor source detection in online social networks: A short survey of a decade of research

Jin, R., and Wu, W. Schemes of propagation models and source estimators for rumor source detection in online social networks: A short survey of a decade of research. Discrete Mathematics, Algorithms and Applications (2021), 2130002

work page 2021

[9] [9]

Maximizing the spread of influence through a social network

Kempe, D., Kleinberg, J., and Tardos, ´E. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (2003), ACM, pp. 137–146

work page 2003

[10] [10]

Ct-ic: Continuously activated and time-restricted inde- pendent cascade model for viral marketing

Kim, J., Lee, W., and Yu, H. Ct-ic: Continuously activated and time-restricted inde- pendent cascade model for viral marketing. Knowledge-Based Systems 62 (2014), 57–68

work page 2014

[11] [11]

S., and Karamchandani, N

Kumar, A., Borkar, V. S., and Karamchandani, N. Temporally agnostic rumor- source detection. IEEE Transactions on Signal and Information Processing over Networks 3, 2 (2017), 316–329

work page 2017

[12] [12]

Finding effectors in social networks

Lappas, T., Terzi, E., Gunopulos, D., and Mannila, H. Finding effectors in social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (2010), pp. 1059–1068

work page 2010

[13] [13]

Estimating a probability using finite memory

Leighton, F., and Rivest, R. Estimating a probability using finite memory. IEEE Transactions on Information Theory 32 , 6 (1986), 733–742

work page 1986

[14] [14]

A., and Peres, Y

Levin, D. A., and Peres, Y. Markov chains and mixing times , vol. 107. American Mathematical Soc., 2017

work page 2017

[15] [15]

Myung, I. J. Tutorial on maximum likelihood estimation. Journal of mathematical Psy- chology 47, 1 (2003), 90–100

work page 2003

[16] [16]

Detecting sources of computer viruses in networks: theory and experiment

Shah, D., and Zaman, T. Detecting sources of computer viruses in networks: theory and experiment. In Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems (2010), pp. 203–214

work page 2010

[17] [17]

Rumors in a network: Who’s the culprit? IEEE Transactions on information theory 57 , 8 (2011), 5163–5181

Shah, D., and Zaman, T. Rumors in a network: Who’s the culprit? IEEE Transactions on information theory 57 , 8 (2011), 5163–5181

work page 2011

[18] [18]

A strong-connectivity algorithm and its applications in data flow analysis

Sharir, M. A strong-connectivity algorithm and its applications in data flow analysis. Computers & Mathematics with Applications 7 , 1 (1981), 67–72

work page 1981

[19] [19]

Source detection of rumor in social network–a review

Shelke, S., and Attar, V. Source detection of rumor in social network–a review. Online Social Networks and Media 9 (2019), 30–42

work page 2019

[20] [20]

A., Li, S., Wu, W., and Du, D.-Z

Tong, G. A., Li, S., Wu, W., and Du, D.-Z. Effector detection in social networks. IEEE Transactions on Computational Social Systems 3 , 4 (2016), 151–163

work page 2016

[21] [21]

Multiple source detection without knowing the underlying propagation model

Wang, Z., Wang, C., Pei, J., and Ye, X. Multiple source detection without knowing the underlying propagation model. In Proceedings of the AAAI Conference on Artificial Intelligence (2017), vol. 31. 17

work page 2017

[22] [22]

Cascade source inference in networks: a markov chain monte carlo approach

Zhai, X., Wu, W., and Xu, W. Cascade source inference in networks: a markov chain monte carlo approach. Computational social networks 2 , 1 (2015), 1–17

work page 2015

[23] [23]

A markov chain monte carlo approach for source detection in networks

Zhang, L., Jin, T., Xu, T., Chang, B., Wang, Z., and Chen, E. A markov chain monte carlo approach for source detection in networks. In Chinese National Conference on Social Media Processing (2017), Springer, pp. 77–88

work page 2017

[24] [24]

Catch’em all: Locating multiple diffusion sources in networks with partial observations

Zhu, K., Chen, Z., and Ying, L. Catch’em all: Locating multiple diffusion sources in networks with partial observations. In Proceedings of the AAAI Conference on Artificial Intelligence (2017), vol. 31. 18

work page 2017