Towards Federated Learning at Scale: System Design

Alex Ingerman; Chloe Kiddon; Daniel Ramage; David Petrou; Dzmitry Huba; H. Brendan McMahan; Hubert Eichner; Jakub Kone\v{c}n\'y; Jason Roselander; Keith Bonawitz

arxiv: 1902.01046 · v2 · pith:5FLRFMSYnew · submitted 2019-02-04 · 💻 cs.LG · cs.DC· stat.ML

Towards Federated Learning at Scale: System Design

Keith Bonawitz , Hubert Eichner , Wolfgang Grieskamp , Dzmitry Huba , Alex Ingerman , Vladimir Ivanov , Chloe Kiddon , Jakub Kone\v{c}n\'y

show 6 more authors

Stefano Mazzocchi H. Brendan McMahan Timon Van Overveldt David Petrou Daniel Ramage Jason Roselander

This is my paper

classification 💻 cs.LG cs.DCstat.ML

keywords learningfederateddesignsystemapproachbuiltchallengescorpus

0 comments

read the original abstract

Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
cs.LG 2026-05 unverdicted novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method
cs.LG 2026-05 unverdicted novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.
Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
math.OC 2026-05 unverdicted novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction
math.OC 2026-05 unverdicted novelty 5.0

Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.
Centralized vs Decentralized Federated Learning: A trade-off performance analysis
cs.LG 2026-05 unverdicted novelty 4.0

Experimental analysis of performance trade-offs across CFL, DFL, and SDFL using Fedstellar simulator, MNIST, and MLP.
Memory as Metabolism: A Design for Companion Knowledge Systems
cs.AI 2026-04 unverdicted novelty 4.0

This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...