Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Cong Xie; Indranil Gupta; Oluwasanmi Koyejo

arxiv: 1805.10032 · v3 · pith:4S7JCZKEnew · submitted 2018-05-25 · 💻 cs.LG · cs.CR· cs.DC· stat.ML

Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance

Cong Xie , Oluwasanmi Koyejo , Indranil Gupta This is my paper

classification 💻 cs.LG cs.CRcs.DCstat.ML

keywords zenodescentdistributedgradientnon-faultyresultsstochasticworkers

0 comments

read the original abstract

We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RESIST: Resilient Decentralized Learning Using Consensus Gradient Descent
cs.LG 2025-02 unverdicted novelty 6.0

RESIST achieves algorithmic and statistical convergence guarantees for strongly convex, PL, and nonconvex ERM under MITM attacks via multistep consensus gradient descent plus robust screening.