arxiv: 2604.12260 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.DC· eess.SP

Recognition: unknown

Decentralized Learning via Random Walk with Jumps

Zonghong Liu , Matthew Dwyer , Salim El Rouayheb

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:17 UTC · model grok-4.3

classification 💻 cs.LG cs.DCeess.SP

keywords decentralized learningrandom walkMetropolis-HastingsLevy jumpsconvergence ratedata heterogeneitynetwork spectral gapentrapment

0 comments

The pith

Adding Levy jumps to Metropolis-Hastings random walks prevents entrapment and yields explicit convergence rates in decentralized learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies token-based random-walk learning for decentralized machine learning on networks with data spread across nodes and no central server. It shows that using the Metropolis-Hastings algorithm to achieve a target sampling distribution can trap the walk inside a small subgraph, producing highly correlated model updates that slow convergence under data heterogeneity. The authors introduce occasional long-range Levy jumps into the transition rule to restore exploration while still using only local information at each node. They derive a convergence bound that isolates the separate effects of data heterogeneity, the network spectral gap, and the jump probability. Experiments confirm the jumps remove entrapment and accelerate learning compared with standard weighted random walks.

Core claim

In weighted random-walk learning over networks, implementing the desired sampling distribution through the Metropolis-Hastings algorithm can trap the walker in a small subgraph, producing correlated updates that slow convergence. Introducing occasional Levy jumps into the transition probabilities restores sufficient exploration. The resulting algorithm, Metropolis-Hastings with Levy Jumps, admits a convergence rate that makes the effects of data heterogeneity, the network's spectral gap, and the jump probability explicit and quantifiable.

What carries the argument

Metropolis-Hastings with Levy Jumps (MHLJ), a modified transition rule that adds rare long-range moves drawn from a Levy distribution to the standard Metropolis-Hastings walk while remaining based only on local information.

If this is right

The convergence rate improves with moderate jump probability because jumps reduce the correlation time induced by heterogeneity.
Networks whose spectral gap is small gain the largest relative speed-up from the jumps.
The method preserves the low communication and computation cost of single-token random walks.
The explicit rate formula lets practitioners choose jump probability to balance exploration against local update quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same jump idea could be added to other token-passing or gossip algorithms that suffer from slow mixing on sparse graphs.
An adaptive rule for choosing jump probability locally, perhaps based on recent visit counts, might remove the need to tune it in advance.
If the Levy distribution can be approximated with a small fixed set of long-range links, the approach might extend to privacy-sensitive or bandwidth-limited deployments.

Load-bearing premise

That the added Levy jumps increase mixing enough to escape local traps without biasing the sampling distribution or requiring any global network knowledge.

What would settle it

On a network with small spectral gap and high data heterogeneity, measure whether the observed convergence rate follows the predicted dependence on jump probability or whether the walk remains trapped and updates stay correlated despite the jumps.

Figures

Figures reproduced from arXiv: 2604.12260 by Matthew Dwyer, Salim El Rouayheb, Zonghong Liu.

**Figure 1.** Figure 1: Decentralized learning via random walk. The model x is carried across the network through a sequence of peer-to-peer transmissions (red arrows). Each visited node performs a local update using its private data before forwarding the model to a neighbor. Decentralized learning eliminates the dependency on a central server by allowing devices to communicate only with their neighbors. In this work, we study d… view at source ↗

**Figure 2.** Figure 2: (a) An example of ring topology with five nodes. The [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Regression model trained on a synthetic hetero [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Regression model trained on a synthetic heterogeneous data set over an Erd [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Regression model trained on a synthetic heterogeneous data set over sparse networks with 1000 nodes. We compare [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: (a) MHLJ switching to the Uniform sampling. (b) MHLJ with a decreasing jump probability. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Performance of mixed distribution and MHLJ on ring [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagated across the network and updated at each visited node using local data, thereby incurring low communication and computational overheads. In weighted random-walk learning, the transition matrix is designed to achieve a desired sampling distribution, thereby speeding up convergence under data heterogeneity. We show that implementing weighted sampling via the Metropolis-Hastings algorithm can lead to a previously unexplored phenomenon we term entrapment. The random walk may become trapped in a small region of the network, resulting in highly correlated updates and severely degraded convergence. To address this issue, we propose Metropolis-Hastings with Levy jumps, which introduces occasional long-range transitions to restore exploration while respecting local information constraints. We establish a convergence rate that explicitly characterizes the roles of data heterogeneity, network spectral gap, and jump probability, and demonstrate through experiments that MHLJ effectively eliminates entrapment and significantly speeds up decentralized learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They flag entrapment in Metropolis-Hastings random walks for decentralized learning and add Levy jumps to restore mixing, with an explicit convergence rate and some experiments.

read the letter

The paper's core observation is that standard Metropolis-Hastings weighting for random-walk token passing can trap the walk in dense subgraphs, producing correlated updates and slow convergence under heterogeneity. They introduce occasional long-range Levy jumps to break that, while keeping the method local. That entrapment issue looks real and previously under-discussed in this setting, and the experiments they run show the jumps cut the number of steps needed and improve final accuracy on heterogeneous data partitions. The convergence bound they state makes the dependence on spectral gap, heterogeneity, and jump probability p explicit, which is cleaner than many black-box analyses in the area. The math appears to preserve the target stationary distribution, at least on paper. The soft spot is the practical choice of p. The bound treats p as a parameter that can be set to balance exploration, but nodes only have local views; nothing in the write-up shows how to pick it without either global spectral information or risking bias in the stationary measure. If the analysis assumes p is tuned with knowledge of the full graph or heterogeneity level, the decentralized claim weakens. The proof sketch is also thin in the abstract, so the rate's tightness and the error terms need checking. This is aimed at researchers already working on token-based or gossip decentralized optimization. It is narrow but addresses a concrete bottleneck they actually hit in practice. The work is coherent enough on its own terms to go to review, though I would flag the local parameter choice and ask for the full derivation and more ablation on p selection.

Referee Report

3 major / 2 minor

Summary. The paper addresses decentralized learning on networks with heterogeneous data using a token-based random walk approach. It identifies an 'entrapment' issue in Metropolis-Hastings weighted sampling that leads to poor mixing and correlated updates. To mitigate this, the authors introduce Metropolis-Hastings with Levy jumps (MHLJ), which adds occasional long-range transitions. They claim to derive a convergence rate explicitly depending on data heterogeneity, the network spectral gap, and the jump probability p, while showing via experiments that MHLJ eliminates entrapment and accelerates learning compared to baselines.

Significance. If the convergence analysis holds and the experiments are reproducible, the work would be significant for decentralized optimization: it provides a practical fix for mixing problems in token-based methods and a rate that separates the contributions of heterogeneity, topology, and the tunable jump parameter. This could inform algorithm design in communication-constrained settings where standard random walks fail under heterogeneity.

major comments (3)

[§3 (Convergence Analysis), Theorem 1] §3 (Convergence Analysis), Theorem 1: The stated convergence rate is claimed to characterize the role of jump probability p, but the derivation does not specify how p is selected using only local information; if p requires knowledge of the global spectral gap or heterogeneity level to avoid biasing the stationary distribution, the bound does not apply under the paper's decentralized constraint.
[§4 (Transition Kernel)] §4 (Transition Kernel): The proof that the MHLJ kernel preserves the target stationary distribution while incorporating Levy jumps is not shown in detail; without an explicit verification that the jump step maintains detailed balance using only neighbor information, the rate's validity is undermined.
[Experimental section, Table 2 and Figure 3] Experimental section, Table 2 and Figure 3: The reported speed-ups and entrapment elimination are presented for fixed p values, but no sensitivity analysis or local tuning procedure for p is provided, leaving open whether the empirical gains hold when p must be chosen without global network knowledge.

minor comments (2)

[Method section] The exact probability distribution for the Levy jumps (e.g., uniform over non-neighbors or power-law) should be stated explicitly in the method section to allow reproduction.
[Notation] Notation for the spectral gap and heterogeneity measure should be defined consistently between the theorem statement and the experimental setup.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We provide point-by-point responses to the major comments below, offering clarifications and committing to revisions where they strengthen the paper.

read point-by-point responses

Referee: [§3 (Convergence Analysis), Theorem 1] The stated convergence rate is claimed to characterize the role of jump probability p, but the derivation does not specify how p is selected using only local information; if p requires knowledge of the global spectral gap or heterogeneity level to avoid biasing the stationary distribution, the bound does not apply under the paper's decentralized constraint.

Authors: We clarify that p is a user-chosen parameter that can be set locally without any global information. The convergence bound in Theorem 1 holds for any p in (0,1], and the analysis does not require p to be tuned based on the spectral gap or heterogeneity; it explicitly shows the dependence on these quantities separately from p. The stationary distribution is preserved for any such p, as the jump transitions are constructed to satisfy detailed balance using only local information. Therefore, the decentralized setting is fully respected, and the bound applies as stated. revision: no
Referee: [§4 (Transition Kernel)] The proof that the MHLJ kernel preserves the target stationary distribution while incorporating Levy jumps is not shown in detail; without an explicit verification that the jump step maintains detailed balance using only neighbor information, the rate's validity is undermined.

Authors: We agree with this observation. The main text provided a high-level description, but the detailed proof was omitted for brevity. In the revised manuscript, we will include a full proof in the appendix that explicitly verifies the preservation of the stationary distribution under the Levy jump mechanism. This proof demonstrates that detailed balance is maintained using only information from neighboring nodes, ensuring no bias is introduced. revision: yes
Referee: [Experimental section, Table 2 and Figure 3] The reported speed-ups and entrapment elimination are presented for fixed p values, but no sensitivity analysis or local tuning procedure for p is provided, leaving open whether the empirical gains hold when p must be chosen without global network knowledge.

Authors: We concur that a sensitivity study would be beneficial. We will revise the experimental section to include a sensitivity analysis over a range of p values on various network topologies. Furthermore, we will describe and evaluate a decentralized procedure for selecting p locally, for example by using local estimates of connectivity, and show that the observed improvements remain consistent. revision: yes

Circularity Check

0 steps flagged

No circularity detected in convergence analysis

full rationale

The paper derives a convergence rate for Metropolis-Hastings with Levy jumps by analyzing the transition kernel's mixing properties under data heterogeneity and network spectral gap. This rate is presented as a function of those parameters plus the jump probability, using standard Markov chain techniques on the augmented graph. No step reduces the claimed rate to a fitted quantity, self-definition, or load-bearing self-citation; the analysis treats the jump probability as an exogenous design parameter whose effect is characterized rather than presupposed. The derivation remains independent of the target empirical claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Relies on standard network mixing assumptions and introduces jump probability as a tunable parameter; Levy jumps are a new mechanism without independent evidence outside the proposal.

free parameters (1)

jump probability
Controls frequency of long-range transitions; must be chosen to balance exploration and local updates.

axioms (1)

domain assumption Network is connected with positive spectral gap; data heterogeneity is bounded.
Convergence rate explicitly depends on these quantities.

invented entities (1)

Levy jumps no independent evidence
purpose: Occasional long-range transitions to escape local entrapment.
New mechanism introduced to restore exploration while respecting local constraints.

pith-pipeline@v0.9.0 · 5474 in / 1152 out tokens · 54171 ms · 2026-05-10T15:17:19.890761+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 4 canonical work pages

[1]

The entrapment problem in random walk decentralized learning,

Z. Liu, S. El Rouayheb, and M. Dwyer, “The entrapment problem in random walk decentralized learning,” in2024 IEEE International Symposium on Information Theory (ISIT). IEEE, 2024, pp. 3660–3665

2024
[2]

Parallelized stochastic gradient descent,

M. Zinkevich, M. Weimer, L. Li, and A. Smola, “Parallelized stochastic gradient descent,”Advances in neural information processing systems, vol. 23, 2010

2010
[3]

Parallel coordinate descent methods for big data optimization,

P. Richt ´arik and M. Tak ´aˇc, “Parallel coordinate descent methods for big data optimization,”Mathematical Programming, vol. 156, pp. 433–484, 2016

2016
[4]

Scaffold: Stochastic controlled averaging for federated learn- ing,

S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learn- ing,” inInternational conference on machine learning. PMLR, 2020, pp. 5132–5143

2020
[5]

Localnewton: Reducing communication bottleneck for distributed learning,

V . Gupta, A. Ghosh, M. Derezinski, R. Khanna, K. Ramchandran, and M. Mahoney, “Localnewton: Reducing communication bottleneck for distributed learning,”arXiv preprint arXiv:2105.07320, 2021

work page arXiv 2021
[6]

The hidden vulnerability of distributed learning in byzantium,

R. Guerraoui, S. Rouaultet al., “The hidden vulnerability of distributed learning in byzantium,” inInternational Conference on Machine Learn- ing. PMLR, 2018, pp. 3521–3530

2018
[7]

Advances and open problems in federated learning,

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummingset al., “Advances and open problems in federated learning,”Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021

2021
[8]

Randomized gossip algorithms,

S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,”IEEE transactions on information theory, vol. 52, no. 6, pp. 2508–2530, 2006

2006
[9]

Geographic gossip: Efficient aggregation for sensor networks,

A. G. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip: Efficient aggregation for sensor networks,” inProceedings of the 5th international conference on Information processing in sensor networks, 2006, pp. 69–76

2006
[10]

Distributed subgradient methods for multi- agent optimization,

A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi- agent optimization,”IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009

2009
[11]

A unified theory of decentralized sgd with changing topology and local updates,

A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized sgd with changing topology and local updates,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 5381–5393

2020
[12]

Gossip algorithms for distributed signal processing,

A. G. Dimakis, S. Kar, J. M. Moura, M. G. Rabbat, and A. Scaglione, “Gossip algorithms for distributed signal processing,”Proceedings of the IEEE, vol. 98, no. 11, pp. 1847–1864, 2010

2010
[13]

Decentralized gradient tracking with local steps,

Y . Liu, T. Lin, A. Koloskova, and S. U. Stich, “Decentralized gradient tracking with local steps,”Optimization Methods and Software, vol. 40, no. 5, pp. 1153–1180, 2025

2025
[14]

A randomized incremental subgradient method for distributed optimization in networked systems,

B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental subgradient method for distributed optimization in networked systems,” SIAM Journal on Optimization, vol. 20, no. 3, pp. 1157–1170, 2010

2010
[15]

Private weighted random walk stochas- tic gradient descent,

G. Ayache and S. El Rouayheb, “Private weighted random walk stochas- tic gradient descent,”IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 1, pp. 452–463, 2021

2021
[16]

A principled framework for the design and analysis of token algorithms,

H. Hendrikx, “A principled framework for the design and analysis of token algorithms,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 470–489

2023
[17]

Stochastic gradient descent under markovian sampling schemes,

M. Even, “Stochastic gradient descent under markovian sampling schemes,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 9412–9439

2023
[18]

A stochastic approximation method,

H. Robbins and S. Monro, “A stochastic approximation method,”The annals of mathematical statistics, pp. 400–407, 1951

1951
[19]

Incremental gradient, subgradient, and proximal methods for convex optimization: A survey,

D. P. Bertsekaset al., “Incremental gradient, subgradient, and proximal methods for convex optimization: A survey,”Optimization for Machine Learning, vol. 2010, no. 1-38, p. 3, 2011

2010
[20]

Robust stochastic approximation approach to stochastic programming,

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,”SIAM Journal on optimization, vol. 19, no. 4, pp. 1574–1609, 2009

2009
[21]

Equation of state calculations by fast computing machines,

N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” The journal of chemical physics, vol. 21, no. 6, pp. 1087–1092, 1953

1953
[22]

Monte carlo sampling methods using markov chains and their applications,

W. K. Hastings, “Monte carlo sampling methods using markov chains and their applications,”Biometrika, 1970

1970
[23]

A simple peer-to-peer algorithm for distributed optimization in sensor networks,

B. Johansson, M. Rabi, and M. Johansson, “A simple peer-to-peer algorithm for distributed optimization in sensor networks,” in2007 46th IEEE Conference on Decision and Control. IEEE, 2007, pp. 4705– 4710

2007
[24]

Walk for learning: A random walk approach for federated learning from heterogeneous data,

G. Ayache, V . Dassari, and S. El Rouayheb, “Walk for learning: A random walk approach for federated learning from heterogeneous data,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 4, pp. 929–940, 2023

2023
[25]

Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm,

D. Needell, R. Ward, and N. Srebro, “Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm,”Advances in neural information processing systems, vol. 27, 2014

2014
[26]

Long-range navigation on complex networks using l ´evy random walks,

A. P. Riascos and J. L. Mateos, “Long-range navigation on complex networks using l ´evy random walks,”Physical Review E, vol. 86, no. 5, p. 056110, 2012

2012
[27]

Information leaks in federated learning,

A. Pustozerova and R. Mayer, “Information leaks in federated learning,” inProceedings of the network and distributed system security sympo- sium, vol. 10, 2020, p. 122

2020
[28]

Incremental stochastic subgradient algorithms for convex optimization,

S. S. Ram, A. Nedi ´c, and V . V . Veeravalli, “Incremental stochastic subgradient algorithms for convex optimization,”SIAM Journal on Optimization, vol. 20, no. 2, pp. 691–717, 2009

2009
[29]

Sucag: Stochastic unbiased curvature-aided gradient method for distributed optimization,

H.-T. Wai, N. M. Freris, A. Nedic, and A. Scaglione, “Sucag: Stochastic unbiased curvature-aided gradient method for distributed optimization,” in2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 1751–1756

2018
[30]

Adaptive random walk gradient descent for decentralized optimization,

T. Sun, D. Li, and B. Wang, “Adaptive random walk gradient descent for decentralized optimization,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 20 790–20 809

2022
[31]

Incremental adaptive strategies over distributed networks,

C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies over distributed networks,”IEEE transactions on signal processing, vol. 55, no. 8, pp. 4064–4077, 2007

2007
[32]

Quantized incremental algorithms for distributed optimization,

M. G. Rabbat and R. D. Nowak, “Quantized incremental algorithms for distributed optimization,”IEEE Journal on Selected Areas in Commu- nications, vol. 23, no. 4, pp. 798–808, 2005

2005
[33]

Asynchronous adaptation and learning over networks—part i: Modeling and stability analysis,

X. Zhao and A. H. Sayed, “Asynchronous adaptation and learning over networks—part i: Modeling and stability analysis,”IEEE Transactions on Signal Processing, vol. 63, no. 4, pp. 811–826, 2014

2014
[34]

Ergodic mirror descent,

J. C. Duchi, A. Agarwal, M. Johansson, and M. I. Jordan, “Ergodic mirror descent,”SIAM Journal on Optimization, vol. 22, no. 4, pp. 1549– 1578, 2012

2012
[35]

On markov chain gradient descent,

T. Sun, Y . Sun, and W. Yin, “On markov chain gradient descent,” Advances in neural information processing systems, vol. 31, 2018

2018
[36]

Adapting to mixing time in stochastic optimization with markovian data,

R. Dorfman and K. Y . Levy, “Adapting to mixing time in stochastic optimization with markovian data,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 5429–5446

2022
[37]

Walkman: A communication-efficient random-walk algorithm for decentralized optimization,

X. Mao, K. Yuan, Y . Hu, Y . Gu, A. H. Sayed, and W. Yin, “Walkman: A communication-efficient random-walk algorithm for decentralized optimization,”IEEE Transactions on Signal Processing, vol. 68, pp. 2513–2528, 2020

2020
[38]

A randomized kaczmarz algorithm with exponential convergence,

T. Strohmer and R. Vershynin, “A randomized kaczmarz algorithm with exponential convergence,”Journal of Fourier Analysis and Applications, vol. 15, no. 2, pp. 262–278, 2009

2009
[39]

Stochastic optimization with importance sam- pling for regularized loss minimization,

P. Zhao and T. Zhang, “Stochastic optimization with importance sam- pling for regularized loss minimization,” ininternational conference on machine learning. PMLR, 2015, pp. 1–9

2015
[40]

Importance sampling for minibatches,

D. Csiba and P. Richt ´arik, “Importance sampling for minibatches,”The Journal of Machine Learning Research, vol. 19, no. 1, pp. 962–982, 2018

2018
[41]

Measuring domain shift for deep learning in histopathology,

K. Stacke, G. Eilertsen, J. Unger, and C. Lundstr ¨om, “Measuring domain shift for deep learning in histopathology,”IEEE journal of biomedical and health informatics, vol. 25, no. 2, pp. 325–336, 2020

2020
[42]

A survey on heterogeneous federated learning.arXiv preprint arXiv:2210.04505,

D. Gao, X. Yao, and Q. Yang, “A survey on heterogeneous federated learning,”arXiv preprint arXiv:2210.04505, 2022. 13

work page arXiv 2022
[43]

Data heterogene- ity in federated learning with electronic health records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care,

S. Rajendran, Z. Xu, W. Pan, A. Ghosh, and F. Wang, “Data heterogene- ity in federated learning with electronic health records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care,”PLOS Digital Health, vol. 2, no. 3, p. e0000117, 2023

2023
[44]

Durrett,Essentials of stochastic processes

R. Durrett,Essentials of stochastic processes. Springer, 1999, vol. 1

1999
[45]

Navigation in a small world,

J. M. Kleinberg, “Navigation in a small world,”Nature, vol. 406, no. 6798, pp. 845–845, 2000

2000
[46]

Concentration of Markov chains with bounded moments,

A. Naor, S. Rao, and O. Regev, “Concentration of Markov chains with bounded moments,”Annales de l’Institut Henri Poincar ´e, Probabilit ´es et Statistiques, vol. 56, no. 3, pp. 2270 – 2280, 2020. [Online]. Available: https://doi.org/10.1214/19-AIHP1039

work page doi:10.1214/19-aihp1039 2020
[47]

D. A. Levin and Y . Peres,Markov chains and mixing times. American Mathematical Soc., 2017, vol. 107

2017
[48]

Perturbation of the stationary distribution measured by ergodicity coefficients,

E. Seneta, “Perturbation of the stationary distribution measured by ergodicity coefficients,”Advances in Applied Probability, vol. 20, no. 1, pp. 228–230, 1988

1988
[49]

Network topology and communication-computation tradeoffs in decentralized optimization,

A. Nedi ´c, A. Olshevsky, and M. G. Rabbat, “Network topology and communication-computation tradeoffs in decentralized optimization,” Proceedings of the IEEE, vol. 106, no. 5, pp. 953–976, 2018

2018
[50]

Digest: Fast and communication efficient decentralized learning with local updates,

P. Gholami and H. Seferoglu, “Digest: Fast and communication efficient decentralized learning with local updates,”IEEE Transactions on Ma- chine Learning in Communications and Networking, vol. 2, pp. 1456– 1474, 2024

2024
[51]

Improved generalization bounds for communication efficient federated learning,

——, “Improved generalization bounds for communication efficient federated learning,”arXiv preprint arXiv:2404.11754, 2024

work page arXiv 2024