Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
read the original abstract
We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers. Zeno generalizes previous results that assumed a majority of non-faulty nodes; we need assume only one non-faulty worker. Our key idea is to suspect workers that are potentially defective. Since this is likely to lead to false positives, we use a ranking-based preference mechanism. We prove the convergence of SGD for non-convex problems under these scenarios. Experimental results show that Zeno outperforms existing approaches.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
RESIST: Resilient Decentralized Learning Using Consensus Gradient Descent
RESIST achieves algorithmic and statistical convergence guarantees for strongly convex, PL, and nonconvex ERM under MITM attacks via multistep consensus gradient descent plus robust screening.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.