Random Walk Learning and the Pac-Man Attack

arxiv: 2508.05663 · v4 · submitted 2025-08-01 · 📊 stat.ML · cs.CR· cs.LG· cs.SY· eess.SY

Random Walk Learning and the Pac-Man Attack

Xingran Chen , Parimal Parag , Rohit Bhagat , Zonghong Liu , Salim El Rouayheb This is my paper

Pith reviewed 2026-05-19 02:01 UTC · model grok-4.3

classification 📊 stat.ML cs.CRcs.LGcs.SYeess.SY

keywords random walksPac-Man attackdecentralized learningstochastic gradient descentadversarial robustnesspopulation dynamicsphase transition

0 comments p. Extension

The pith

The Average Crossing algorithm keeps random walk populations bounded and learning convergent under Pac-Man attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Random walk algorithms suit decentralized learning for their scalability and low overhead. Yet they face the Pac-Man attack in which malicious nodes terminate walks that visit them, eventually halting the process. The proposed Average Crossing algorithm duplicates walks locally when their number exceeds a threshold to replenish the population. Analysis proves the walks remain bounded almost surely and gradient descent converges with limited error from optimum. Simulations show a sharp drop in extinction risk once the threshold crosses a critical point.

Core claim

Our theoretical analysis establishes that the RW population remains almost surely bounded under AC and RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum.

What carries the argument

Average Crossing (AC) algorithm, a fully decentralized duplication rule triggered when local random walk count exceeds a threshold to offset terminations by malicious nodes.

Load-bearing premise

The analysis depends on a specific probabilistic model of the attack in which malicious nodes terminate random walks upon visit and on the ability to set the duplication threshold in a fully decentralized manner without global coordination.

What would settle it

An observation or experiment showing the random walk population growing unbounded or the learning process diverging when Average Crossing is applied would falsify the central claims.

Figures

Figures reproduced from arXiv: 2508.05663 by Parimal Parag, Rohit Bhagat, Salim El Rouayheb, Xingran Chen, Zonghong Liu.

**Figure 1.** Figure 1: Illustration of a chain of RWs. Node u duplicates a new RW (green) by copying the purple RW. Node v duplicates a new RW (orange) by copying the green RW. The blue trajectory shows how these RWs are connected over time, forming a chain. and ν (ζ) is defined in Definition 6 in Appendix A. Proof. The full proof is given in Appendix D. We treat a chain of RWs as a single effective RW. The corresponding effecti… view at source ↗

**Figure 2.** Figure 2: Behaviors of {Zt}t⩾0 on the complete graph. on the graph topology), extinction occurs with probability 1. Conversely, when A falls below this critical threshold, the extinction probability drops sharply and approaches zero for small values of A. C. Convergence In [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Loss function v.s. learning steps on different graphs. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Number of RWs over time on different graphs. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 4.** Figure 4: Loss v.s. learning steps on a complete graph: compar [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 6.** Figure 6: Extinction probability vs. threshold A under different graphs. E. Convergence We begin with experiments on the synthetic dataset [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 8.** Figure 8: Loss function v.s. learning steps on different graphs [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 7.** Figure 7: Loss function v.s. learning steps on different graphs. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 9.** Figure 9: Loss function v.s. learning steps on different graphs [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the Average Crossing (AC) algorithm--a fully decentralized mechanism for duplicating RWs to prevent RW extinction in the presence of Pac-Man. Our theoretical analysis establishes that (i) the RW population remains almost surely bounded under AC and (ii) RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum. Our extensive empirical results on both synthetic and real-world datasets corroborate our theoretical findings. Furthermore, they uncover a phase transition in the extinction probability as a function of the duplication threshold. We offer theoretical insights by analyzing a simplified variant of the AC, which sheds light on the observed phase transition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a Pac-Man attack that quietly kills random walks in decentralized learning and counters it with a local Average Crossing duplication rule, plus claims of almost-sure bounded population and convergent SGD with bounded deviation.

read the letter

The main point is that random-walk methods for distributed learning are open to a simple attack where bad nodes just terminate walks on contact, and the authors give a fully local duplication scheme to keep the walks from dying out entirely. They call the attack Pac-Man and the fix Average Crossing, where a walk gets copied once a local threshold on visits or crossings is hit. The theory says the total population stays almost surely bounded under this rule and that the underlying SGD still converges, only with some extra deviation from the true optimum that they quantify. Experiments on synthetic graphs and real datasets back the claims and show a clear phase transition in extinction probability as the duplication threshold varies. They also work through a simplified model to explain why that transition appears. That combination of attack model, countermeasure, and both theory and experiments is what is new here relative to the RW learning papers they cite. The setup is clean and the empirical phase-transition result is a concrete takeaway that could guide parameter choice in practice. The soft spot is the decentralization of the threshold. Nodes set it from local observations alone, so the effective duplication rate can vary across the graph. Almost-sure boundedness arguments usually need a uniform lower bound on the branching factor to rule out local extinction with positive probability. If the paper's proof does not add an extra step to restore uniformity from purely local decisions, that part of the claim would need tightening. The deviation bound is also tied to the threshold, so readers will want to see how sensitive the guarantee is to realistic local choices. This is for people working on scalable decentralized optimization or on security for gossip-style and walk-based protocols. A reader who already uses random walks in sensor nets or peer-to-peer systems would get direct value from the attack definition and the duplication idea. I would send it to peer review. The core contribution is specific enough to check and the empirical observations are worth having in the literature even if the uniformity step in the proof needs more detail.

Referee Report

2 major / 2 minor

Summary. The paper introduces the 'Pac-Man' attack, in which malicious nodes probabilistically terminate random walks (RWs) visiting them, thereby eroding the active RW population and halting RW-based decentralized learning. To counter this, the authors propose the Average Crossing (AC) algorithm, a fully decentralized duplication mechanism triggered when a local threshold on observed crossings is met. The central theoretical claims are that (i) the RW population remains almost surely bounded under AC despite Pac-Man nodes, and (ii) RW-based stochastic gradient descent converges to a neighborhood of the true optimum with an explicit, quantifiable deviation bound. Empirical evaluations on synthetic graphs and real-world datasets are reported to corroborate the boundedness and convergence claims while revealing a phase transition in extinction probability as a function of the duplication threshold; a simplified AC variant is analyzed to explain the transition.

Significance. If the almost-sure boundedness and convergence results hold under the stated decentralized threshold rule, the work supplies a concrete, attack-specific robustness guarantee for a class of low-overhead distributed learning algorithms. The explicit deviation bound and the identification of a phase transition constitute falsifiable predictions that strengthen the contribution beyond purely empirical defenses. These elements could inform the design of other RW-based protocols in adversarial settings.

major comments (2)

[§4] §4 (Theoretical Analysis of Boundedness): The almost-sure boundedness claim for the RW population under AC rests on the duplication threshold being chosen from purely local statistics. Standard branching-process or martingale arguments for extinction probability require a uniform lower bound on the duplication rate across nodes. The manuscript does not appear to supply a lemma showing that local threshold selection restores the necessary uniformity or that heterogeneity cannot create subgraphs with effective branching factor <1. This is load-bearing for claim (i).
[§5] §5 (Convergence of RW-SGD under AC): The quantifiable deviation from the true optimum is stated to be explicit, yet the derivation appears to treat the effective step-size and mixing time after duplication as constants independent of the realized RW population size. If the population bound in §4 is only almost-sure rather than in expectation with explicit moments, the deviation bound may require an additional uniform integrability argument that is not visible in the provided outline.

minor comments (2)

[Abstract / §5] The abstract and introduction use 'quantifiable deviation' without a forward reference to the precise expression (e.g., the constant multiplying the variance term). Adding an equation number in the abstract or a parenthetical in §5 would improve readability.
[Empirical Evaluation] Figure captions for the phase-transition plots should explicitly state the number of Monte-Carlo trials and the precise definition of 'extinction' used (e.g., population dropping below 1 for T consecutive steps).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments identify important points in the theoretical analysis that we address below. We outline revisions that will strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [§4] §4 (Theoretical Analysis of Boundedness): The almost-sure boundedness claim for the RW population under AC rests on the duplication threshold being chosen from purely local statistics. Standard branching-process or martingale arguments for extinction probability require a uniform lower bound on the duplication rate across nodes. The manuscript does not appear to supply a lemma showing that local threshold selection restores the necessary uniformity or that heterogeneity cannot create subgraphs with effective branching factor <1. This is load-bearing for claim (i).

Authors: We agree that an explicit argument for uniformity is desirable to make the branching-process comparison fully rigorous. The local threshold rule, by averaging observed crossings, induces a duplication rate that is bounded away from zero uniformly across nodes because the random walk is ergodic on the finite graph; this prevents persistent subcritical subgraphs. To address the referee's concern directly, the revised manuscript will include a new supporting lemma (Lemma 4.2) that couples the heterogeneous process to a homogeneous branching process with branching factor strictly greater than one, using the fact that the threshold is a consistent estimator of the local Pac-Man density. revision: yes
Referee: [§5] §5 (Convergence of RW-SGD under AC): The quantifiable deviation from the true optimum is stated to be explicit, yet the derivation appears to treat the effective step-size and mixing time after duplication as constants independent of the realized RW population size. If the population bound in §4 is only almost-sure rather than in expectation with explicit moments, the deviation bound may require an additional uniform integrability argument that is not visible in the provided outline.

Authors: The referee is correct that an almost-sure population bound alone does not automatically yield an explicit, non-random deviation without additional justification. Because the graph is finite, the mixing time is deterministically bounded once the population size is fixed; the almost-sure bound therefore implies a uniform (random but finite) bound on all relevant constants. In the revision we will insert a short uniform-integrability argument (based on the bounded convergence theorem applied to the finite-state Markov chain) immediately after the statement of the almost-sure bound, thereby justifying the interchange of limit and expectation in the deviation expression. revision: yes

Circularity Check

0 steps flagged

No circularity detected in theoretical derivation chain

full rationale

The paper presents its core results on almost-sure boundedness of the random walk population and convergence of RW-based SGD under the Average Crossing (AC) mechanism as outcomes of a probabilistic analysis of the Pac-Man attack model. The abstract and description frame these as independent theoretical guarantees with a quantifiable deviation, supported by a simplified variant analysis for the observed phase transition. No quoted equations or steps reduce the claims by construction to fitted inputs, self-definitions, or load-bearing self-citations; the duplication threshold is described as a decentralized design choice whose properties are analyzed rather than presupposed to match the result. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claims rest on a new attack model and a new duplication mechanism, plus standard assumptions about random walk movement on networks and probabilistic termination. One free parameter is the duplication threshold that controls the phase transition.

free parameters (1)

duplication threshold
Parameter controlling when RWs duplicate under AC; appears central to the observed phase transition in extinction probability.

axioms (1)

domain assumption Random walks move on a connected network according to standard transition probabilities
Invoked implicitly as the basis for RW-based algorithms in distributed systems.

invented entities (2)

Pac-Man attack no independent evidence
purpose: Model of malicious node that probabilistically terminates visiting RWs
Newly defined adversarial behavior to capture stealthy elimination of RWs.
Average Crossing (AC) algorithm no independent evidence
purpose: Decentralized duplication rule to prevent RW extinction
Proposed mechanism whose properties are analyzed.

pith-pipeline@v0.9.0 · 5765 in / 1303 out tokens · 40337 ms · 2026-05-19T02:01:39.805417+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1 (Boundedness): lim sup Z_t < ∞ a.s. ... supermartingale M_k = V(Z_{t0+dk})

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

Random walks on graphs,

L. Lovász, “Random walks on graphs,”Combinatorics, Paul Erd˝ os is eighty, vol. 2, pp. 1–46, 1993

work page 1993
[2]

D. A. Levin and Y . Peres,Markov Chains and Mixing Times. American Mathematical Society, second ed., 2017

work page 2017
[3]

A unified theory of decentralized SGD with changing topology and local updates,

A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” inProceedings of the 37th International Conference on Machine Learn- ing, vol. 119, pp. 5381–5393, 2020

work page 2020
[4]

The pagerank citation ranking: Bringing order to the web,

L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web,” technical report, Stanford InfoLab, 1999

work page 1999
[5]

Random-walk com- putation of similarities between nodes of a graph with application to collaborative recommendation,

F. Fouss, A. Pirotte, J. Renders, and M. Saerens, “Random-walk com- putation of similarities between nodes of a graph with application to collaborative recommendation,”IEEE Transactions on knowledge and data engineering, vol. 19, no. 3, pp. 355–368, 2007

work page 2007
[6]

Smartwalk: Enhancing social network security via adaptive random walks,

Y . Liu, S. Ji, and P. Mittal, “Smartwalk: Enhancing social network security via adaptive random walks,” inProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 492–503, 2016

work page 2016
[7]

Supervised random walks: predicting and recommending links in social networks,

L. Backstrom and J. Leskovec, “Supervised random walks: predicting and recommending links in social networks,” inProceedings of the fourth ACM international conference on Web search and data mining, pp. 635–644, 2011

work page 2011
[8]

Het- erogeneous graph neural network,

C. Zhang, D. Song, C. Huang, A. Swami, and N. Chawla, “Het- erogeneous graph neural network,” inProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 793–803, 2019

work page 2019
[9]

A simple peer-to-peer algorithm for distributed optimization in sensor networks,

B. Johansson, M. Rabi, and M. Johansson, “A simple peer-to-peer algorithm for distributed optimization in sensor networks,” in46th IEEE Conference on Decision and Control, pp. 4705–4710, 2007

work page 2007
[10]

Walk for learning: A random walk approach for federated learning from heterogeneous data,

G. Ayache, V . Dassari, and S. E. Rouayheb, “Walk for learning: A random walk approach for federated learning from heterogeneous data,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 4, pp. 929–940, 2023

work page 2023
[11]

Coupled-space attacks against random-walk-based anomaly detection,

Y . Lai, M. Waniek, L. Li, J. Wu, Y . Zhu, T. P. Michalak, T. Rahwan, and K. Zhou, “Coupled-space attacks against random-walk-based anomaly detection,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 9315–9329, 2024

work page 2024
[12]

Enhancing sybil detection via social-activity networks: A random walk approach,

X. Zhang, H. Xie, P. Yi, and J. Lui, “Enhancing sybil detection via social-activity networks: A random walk approach,”IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 2, pp. 1213–1227, 2023

work page 2023
[13]

Self-duplicating random walks for resilient decentralized learning on graphs,

M. Egger, G. Ayache, R. Bitar, A. Wachter-Zeh, and S. E. Rouayheb, “Self-duplicating random walks for resilient decentralized learning on graphs,” inGLOBECOM 2024 - 2024 IEEE Global Communications Conference, pp. 2960–2965, 2024

work page 2024
[14]

Self-regulating random walks for resilient decentralized learning on graphs,

M. Egger, R. Bitar, G. Ayache, A. Wachter-Zeh, and S. E. Rouayheb, “Self-regulating random walks for resilient decentralized learning on graphs,”arXiv preprint arXiv:2407.11762, 2024. Revised February 10, 2025

work page arXiv 2024
[15]

On markov chain gradient descent,

T. Sun, Y . Sun, and W. Yin, “On markov chain gradient descent,” Advances in neural information processing systems, vol. 31, 2018

work page 2018
[16]

Walkman: A communication-efficient random-walk algorithm for decentralized opti- mization,

X. Mao, K. Yuan, Y . Hu, Y . Gu, A. H. Sayed, and W. Yin, “Walkman: A communication-efficient random-walk algorithm for decentralized opti- mization,”IEEE Transactions on Signal Processing, vol. 68, pp. 2513– 2528, 2020

work page 2020
[17]

Implementing fault-tolerant services using the state ma- chine approach: A tutorial,

F. B. Schneider, “Implementing fault-tolerant services using the state ma- chine approach: A tutorial,”ACM Computing Surveys (CSUR), vol. 22, no. 4, pp. 299–319, 1990

work page 1990
[18]

The part-time parliament,

L. Lamport, “The part-time parliament,”ACM Transactions on Computer Systems (TOCS), vol. 16, no. 2, pp. 133–169, 1998

work page 1998
[19]

The byzantine generals prob- lem,

L. Lamport, R. Shostak, and M. Pease, “The byzantine generals prob- lem,”ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982

work page 1982
[20]

A survey of rollback-recovery protocols in message-passing systems,

E. N. Elnozahy, L. Alvisi, Y . Wang, and D. B. Johnson, “A survey of rollback-recovery protocols in message-passing systems,”ACM Comput- ing Surveys (CSUR), vol. 34, no. 3, pp. 375–408, 2002

work page 2002
[21]

Unreliable failure detectors for reliable distributed systems,

T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable distributed systems,”Journal of the ACM (JACM), vol. 43, no. 2, pp. 225–267, 1996

work page 1996
[22]

Stochastic gradient descent under markovian sampling schemes,

M. Even, “Stochastic gradient descent under markovian sampling schemes,” inProceedings of the 40th International Conference on Machine Learning, 2023

work page 2023
[23]

A randomized incremental subgradient method for distributed optimization in networked systems,

B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental subgradient method for distributed optimization in networked systems,” SIAM Journal on Optimization, vol. 20, no. 3, pp. 1157–1170, 2010

work page 2010
[24]

On markov chain gradient descent,

T. Sun, Y . Sun, and W. Yin, “On markov chain gradient descent,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9918–9927, 2018

work page 2018
[25]

The mnist database of handwritten digit images for machine learning research,

L. Deng, “The mnist database of handwritten digit images for machine learning research,”IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141 – 142, 2012

work page 2012
[26]

Durrett,Probability: Theory and Examples

R. Durrett,Probability: Theory and Examples. Thomson, Brooks Cole, 2019

work page 2019
[27]

Yaglom limits can depend on the starting state,

R. D. Foley and D. R. McDonald, “Yaglom limits can depend on the starting state,”Journal of Applied Probability, vol. 54, no. 3, pp. 726– 734, 2017

work page 2017
[28]

Collet, S

P. Collet, S. Martínez, and J. S. Martín,Quasi-Stationary Distributions: Markov Chains, Diffusions and Dynamical Systems. Probability and Its Applications, Springer, 2012

work page 2012
[29]

Billingsley,Probability and Measure

P. Billingsley,Probability and Measure. Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc., 3rd ed., 1995

work page 1995
[30]

On quasi-stationary distributions in absorbing discrete-time finite markov chains,

J. N. Darroch and E. Seneta, “On quasi-stationary distributions in absorbing discrete-time finite markov chains,”Journal of Applied Prob- ability, vol. 2, no. 1, pp. 88–100, 1965

work page 1965
[31]

Boyd and L

S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004

work page 2004
[32]

The entrapment problem in random walk decentralized learning,

Z. Liu, S. Rouayheb, and M. Dwyer, “The entrapment problem in random walk decentralized learning,” inIEEE ISIT, 2024

work page 2024
[33]

Expander graphs and their applications,

S. Hoory, N. Linial, and A. Wigderson, “Expander graphs and their applications,”Bulletin of the American Mathematical Society, vol. 43, no. 4, pp. 439–561, 2006

work page 2006
[34]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,” 2019. arXiv:1909.06335. APPENDIXA PRELIMINARIES: NOTATIONS, DEFINITIONS,AND ASSUMPTIONS In this Appendix, we introduce the notation, definitions, and assumptions used in the system model and its transition dynamics. Definition 3...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[35]

For simplicity in analysis, conditional on the eventZ t =z, we re-label the active RWs inZ t as[z]without loss of generality

Expression for∆ t(A, z):In this subsection, we first derive a closed-form expression for∆ t(A, z). For simplicity in analysis, conditional on the eventZ t =z, we re-label the active RWs inZ t as[z]without loss of generality. That is, we assign eachj∈ Z t a new index in[z]via a one-to- one mapping. Accordingly, the associated variablesL (u,j) t and H (u,j)...

work page
[36]

Proof:LetA⩾¯α

Proof of Part (a):Since each RW hits the Pac-Man at node1with probability 1 N at every step, and is killed with probabilityζ= 1upon hitting it, then, given the number of active RWsZ t =z, the expected number of RWs that die at timetis z N .(34) From (22) and (34), given the number of active RWsZ t =z, we derive E[Z t+1 |Z t =z] =z 1− 1 N + ∆t(A, z) z ≜zm ...

work page
[37]

Proof of Part (b):Now, we re-visit the recursion: E Zt+1 |Z t =z =zm t(A, z), wherem t(A, z)is defined in (35): mt(A, z)≜1− 1 N + ∆t(A, z) z . Lemma 4.If there exists a positive integerα , such that when A⩽α , the following inequality holds: inf t⩾0 inf z∈Z+ ∆t(A, z) z > 1 N .(42) Then, for anyA⩽α , we have Pr (∃t0,∀t⩾t 0, Z t = 0|Z 0 =z 0)<1 for allz 0. ...

work page
[38]

Whenζ= 1, the absorbing stateA={1, w}, so ˜πt = [0, πchain,t], whereπ chain;t is a discrete distribution supported on a finite set of sizeN, and lim t→∞ ˜πt = [0, ν(1)]

work page
[39]

small” and “large

When0< ζ <1, the absorbing stateA={w}, so˜π t = πchain,t, whereπ chain;t is a discrete distribution supported on a finite set of sizeN+ 1, and lim t→∞ ˜πt =ν (ζ). APPENDIXE THE EFFECTIVE TRANSITION PROBABILITY MATRIXP (ζ) CHAIN In fact, as discussed before, we condense the time interval between the termination of the last RW and the creation of the next R...

work page

[1] [1]

Random walks on graphs,

L. Lovász, “Random walks on graphs,”Combinatorics, Paul Erd˝ os is eighty, vol. 2, pp. 1–46, 1993

work page 1993

[2] [2]

D. A. Levin and Y . Peres,Markov Chains and Mixing Times. American Mathematical Society, second ed., 2017

work page 2017

[3] [3]

A unified theory of decentralized SGD with changing topology and local updates,

A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” inProceedings of the 37th International Conference on Machine Learn- ing, vol. 119, pp. 5381–5393, 2020

work page 2020

[4] [4]

The pagerank citation ranking: Bringing order to the web,

L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web,” technical report, Stanford InfoLab, 1999

work page 1999

[5] [5]

Random-walk com- putation of similarities between nodes of a graph with application to collaborative recommendation,

F. Fouss, A. Pirotte, J. Renders, and M. Saerens, “Random-walk com- putation of similarities between nodes of a graph with application to collaborative recommendation,”IEEE Transactions on knowledge and data engineering, vol. 19, no. 3, pp. 355–368, 2007

work page 2007

[6] [6]

Smartwalk: Enhancing social network security via adaptive random walks,

Y . Liu, S. Ji, and P. Mittal, “Smartwalk: Enhancing social network security via adaptive random walks,” inProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 492–503, 2016

work page 2016

[7] [7]

Supervised random walks: predicting and recommending links in social networks,

L. Backstrom and J. Leskovec, “Supervised random walks: predicting and recommending links in social networks,” inProceedings of the fourth ACM international conference on Web search and data mining, pp. 635–644, 2011

work page 2011

[8] [8]

Het- erogeneous graph neural network,

C. Zhang, D. Song, C. Huang, A. Swami, and N. Chawla, “Het- erogeneous graph neural network,” inProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 793–803, 2019

work page 2019

[9] [9]

A simple peer-to-peer algorithm for distributed optimization in sensor networks,

B. Johansson, M. Rabi, and M. Johansson, “A simple peer-to-peer algorithm for distributed optimization in sensor networks,” in46th IEEE Conference on Decision and Control, pp. 4705–4710, 2007

work page 2007

[10] [10]

Walk for learning: A random walk approach for federated learning from heterogeneous data,

G. Ayache, V . Dassari, and S. E. Rouayheb, “Walk for learning: A random walk approach for federated learning from heterogeneous data,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 4, pp. 929–940, 2023

work page 2023

[11] [11]

Coupled-space attacks against random-walk-based anomaly detection,

Y . Lai, M. Waniek, L. Li, J. Wu, Y . Zhu, T. P. Michalak, T. Rahwan, and K. Zhou, “Coupled-space attacks against random-walk-based anomaly detection,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 9315–9329, 2024

work page 2024

[12] [12]

Enhancing sybil detection via social-activity networks: A random walk approach,

X. Zhang, H. Xie, P. Yi, and J. Lui, “Enhancing sybil detection via social-activity networks: A random walk approach,”IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 2, pp. 1213–1227, 2023

work page 2023

[13] [13]

Self-duplicating random walks for resilient decentralized learning on graphs,

M. Egger, G. Ayache, R. Bitar, A. Wachter-Zeh, and S. E. Rouayheb, “Self-duplicating random walks for resilient decentralized learning on graphs,” inGLOBECOM 2024 - 2024 IEEE Global Communications Conference, pp. 2960–2965, 2024

work page 2024

[14] [14]

Self-regulating random walks for resilient decentralized learning on graphs,

M. Egger, R. Bitar, G. Ayache, A. Wachter-Zeh, and S. E. Rouayheb, “Self-regulating random walks for resilient decentralized learning on graphs,”arXiv preprint arXiv:2407.11762, 2024. Revised February 10, 2025

work page arXiv 2024

[15] [15]

On markov chain gradient descent,

T. Sun, Y . Sun, and W. Yin, “On markov chain gradient descent,” Advances in neural information processing systems, vol. 31, 2018

work page 2018

[16] [16]

Walkman: A communication-efficient random-walk algorithm for decentralized opti- mization,

X. Mao, K. Yuan, Y . Hu, Y . Gu, A. H. Sayed, and W. Yin, “Walkman: A communication-efficient random-walk algorithm for decentralized opti- mization,”IEEE Transactions on Signal Processing, vol. 68, pp. 2513– 2528, 2020

work page 2020

[17] [17]

Implementing fault-tolerant services using the state ma- chine approach: A tutorial,

F. B. Schneider, “Implementing fault-tolerant services using the state ma- chine approach: A tutorial,”ACM Computing Surveys (CSUR), vol. 22, no. 4, pp. 299–319, 1990

work page 1990

[18] [18]

The part-time parliament,

L. Lamport, “The part-time parliament,”ACM Transactions on Computer Systems (TOCS), vol. 16, no. 2, pp. 133–169, 1998

work page 1998

[19] [19]

The byzantine generals prob- lem,

L. Lamport, R. Shostak, and M. Pease, “The byzantine generals prob- lem,”ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982

work page 1982

[20] [20]

A survey of rollback-recovery protocols in message-passing systems,

E. N. Elnozahy, L. Alvisi, Y . Wang, and D. B. Johnson, “A survey of rollback-recovery protocols in message-passing systems,”ACM Comput- ing Surveys (CSUR), vol. 34, no. 3, pp. 375–408, 2002

work page 2002

[21] [21]

Unreliable failure detectors for reliable distributed systems,

T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable distributed systems,”Journal of the ACM (JACM), vol. 43, no. 2, pp. 225–267, 1996

work page 1996

[22] [22]

Stochastic gradient descent under markovian sampling schemes,

M. Even, “Stochastic gradient descent under markovian sampling schemes,” inProceedings of the 40th International Conference on Machine Learning, 2023

work page 2023

[23] [23]

A randomized incremental subgradient method for distributed optimization in networked systems,

B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental subgradient method for distributed optimization in networked systems,” SIAM Journal on Optimization, vol. 20, no. 3, pp. 1157–1170, 2010

work page 2010

[24] [24]

On markov chain gradient descent,

T. Sun, Y . Sun, and W. Yin, “On markov chain gradient descent,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9918–9927, 2018

work page 2018

[25] [25]

The mnist database of handwritten digit images for machine learning research,

L. Deng, “The mnist database of handwritten digit images for machine learning research,”IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141 – 142, 2012

work page 2012

[26] [26]

Durrett,Probability: Theory and Examples

R. Durrett,Probability: Theory and Examples. Thomson, Brooks Cole, 2019

work page 2019

[27] [27]

Yaglom limits can depend on the starting state,

R. D. Foley and D. R. McDonald, “Yaglom limits can depend on the starting state,”Journal of Applied Probability, vol. 54, no. 3, pp. 726– 734, 2017

work page 2017

[28] [28]

Collet, S

P. Collet, S. Martínez, and J. S. Martín,Quasi-Stationary Distributions: Markov Chains, Diffusions and Dynamical Systems. Probability and Its Applications, Springer, 2012

work page 2012

[29] [29]

Billingsley,Probability and Measure

P. Billingsley,Probability and Measure. Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc., 3rd ed., 1995

work page 1995

[30] [30]

On quasi-stationary distributions in absorbing discrete-time finite markov chains,

J. N. Darroch and E. Seneta, “On quasi-stationary distributions in absorbing discrete-time finite markov chains,”Journal of Applied Prob- ability, vol. 2, no. 1, pp. 88–100, 1965

work page 1965

[31] [31]

Boyd and L

S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge University Press, 2004

work page 2004

[32] [32]

The entrapment problem in random walk decentralized learning,

Z. Liu, S. Rouayheb, and M. Dwyer, “The entrapment problem in random walk decentralized learning,” inIEEE ISIT, 2024

work page 2024

[33] [33]

Expander graphs and their applications,

S. Hoory, N. Linial, and A. Wigderson, “Expander graphs and their applications,”Bulletin of the American Mathematical Society, vol. 43, no. 4, pp. 439–561, 2006

work page 2006

[34] [34]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

T. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non- identical data distribution for federated visual classification,” 2019. arXiv:1909.06335. APPENDIXA PRELIMINARIES: NOTATIONS, DEFINITIONS,AND ASSUMPTIONS In this Appendix, we introduce the notation, definitions, and assumptions used in the system model and its transition dynamics. Definition 3...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[35] [35]

For simplicity in analysis, conditional on the eventZ t =z, we re-label the active RWs inZ t as[z]without loss of generality

Expression for∆ t(A, z):In this subsection, we first derive a closed-form expression for∆ t(A, z). For simplicity in analysis, conditional on the eventZ t =z, we re-label the active RWs inZ t as[z]without loss of generality. That is, we assign eachj∈ Z t a new index in[z]via a one-to- one mapping. Accordingly, the associated variablesL (u,j) t and H (u,j)...

work page

[36] [36]

Proof:LetA⩾¯α

Proof of Part (a):Since each RW hits the Pac-Man at node1with probability 1 N at every step, and is killed with probabilityζ= 1upon hitting it, then, given the number of active RWsZ t =z, the expected number of RWs that die at timetis z N .(34) From (22) and (34), given the number of active RWsZ t =z, we derive E[Z t+1 |Z t =z] =z 1− 1 N + ∆t(A, z) z ≜zm ...

work page

[37] [37]

Proof of Part (b):Now, we re-visit the recursion: E Zt+1 |Z t =z =zm t(A, z), wherem t(A, z)is defined in (35): mt(A, z)≜1− 1 N + ∆t(A, z) z . Lemma 4.If there exists a positive integerα , such that when A⩽α , the following inequality holds: inf t⩾0 inf z∈Z+ ∆t(A, z) z > 1 N .(42) Then, for anyA⩽α , we have Pr (∃t0,∀t⩾t 0, Z t = 0|Z 0 =z 0)<1 for allz 0. ...

work page

[38] [38]

Whenζ= 1, the absorbing stateA={1, w}, so ˜πt = [0, πchain,t], whereπ chain;t is a discrete distribution supported on a finite set of sizeN, and lim t→∞ ˜πt = [0, ν(1)]

work page

[39] [39]

small” and “large

When0< ζ <1, the absorbing stateA={w}, so˜π t = πchain,t, whereπ chain;t is a discrete distribution supported on a finite set of sizeN+ 1, and lim t→∞ ˜πt =ν (ζ). APPENDIXE THE EFFECTIVE TRANSITION PROBABILITY MATRIXP (ζ) CHAIN In fact, as discussed before, we condense the time interval between the termination of the last RW and the creation of the next R...

work page