Recognition: unknown
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
Pith reviewed 2026-05-14 19:27 UTC · model grok-4.3
The pith
Rescaling worker stepsizes by computation time fixes bias in asynchronous SGD so it converges to the true global objective.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rescaled ASGD recovers convergence to stationary points of the correct global objective by rescaling each worker's stepsize proportionally to its computation time inside the standard asynchronous update rule. In the non-convex setting the method matches the optimal leading time-complexity term, with the influence of staleness and heterogeneity confined to lower-order terms.
What carries the argument
Rescaling of per-worker stepsizes in proportion to measured computation times, equalizing aggregate learning rates across heterogeneous workers inside the vanilla ASGD mechanism.
If this is right
- The algorithm converges to stationary points of the true global objective rather than a frequency-weighted average.
- Leading time complexity matches the known lower bound for distributed non-convex optimization.
- Staleness and data heterogeneity affect only lower-order terms in the complexity bound.
- The method remains competitive with state-of-the-art baselines while using the unmodified ASGD communication pattern.
Where Pith is reading between the lines
- The same rescaling idea could be tested on other first-order asynchronous methods to restore unbiasedness without extra synchronization phases.
- In practice the approach may allow heterogeneous clusters to be used at full speed without explicit load balancing.
- Extensions to strongly convex or federated settings would be natural next checks of whether the lower-order terms remain benign.
Load-bearing premise
The objective is smooth and the local data distributions satisfy a bounded-heterogeneity condition.
What would settle it
Run Rescaled ASGD and plain ASGD side-by-side on a heterogeneous synthetic dataset whose global minimum is known; check whether the final loss of Rescaled ASGD matches the known global value while plain ASGD does not.
Figures
read the original abstract
Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient arrives. Vanilla ASGD applies each arriving gradient with the same weight. When local data distributions are heterogeneous, this becomes problematic: faster workers contribute more updates, and we show theoretically that the method is biased toward a frequency-weighted average of the local objectives rather than the desired global objective. Existing remedies typically move away from the simple ASGD template by introducing gathering phases, buffering, or extra memory. We show that this is unnecessary. Keeping the standard ASGD mechanism, we recover the correct objective by rescaling worker-specific stepsizes in proportion to their computation times, so that each worker contributes the same aggregate learning rate over a cycle. In the non-convex setting, under smoothness and bounded heterogeneity assumptions, we prove that the resulting method, Rescaled ASGD, converges to stationary points of the correct global objective in the fixed-computation model. Its time complexity matches the known lower bound in the leading term, while the effects of staleness and data heterogeneity appear only in lower-order terms. Experiments confirm that the method converges to the correct objective and is competitive with state-of-the-art baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Rescaled ASGD, a simple modification to asynchronous SGD that rescales each worker's stepsize η_i proportionally to its computation time t_i. This ensures equal aggregate learning rates across workers over a cycle, correcting the bias of vanilla ASGD toward a frequency-weighted average of local objectives under data heterogeneity. Under standard L-smoothness and bounded heterogeneity assumptions, the paper proves convergence to stationary points of the true global objective in the fixed-computation model. The leading term of the time complexity matches known lower bounds, while staleness and heterogeneity appear only in lower-order terms. Experiments confirm convergence to the correct objective and competitiveness with baselines.
Significance. If the analysis holds, this is a significant contribution: it achieves optimal rates for heterogeneous distributed optimization with a minimal change to the standard ASGD template, avoiding extra memory, buffering, or synchronization phases. The parameter-free rescaling and the clean separation of complexity terms (leading term optimal, others lower-order) would be valuable for both theory and practice in large-scale training.
major comments (2)
- [Abstract and convergence analysis] The rescaling η_i ∝ t_i (stated in the abstract) may violate the uniform stepsize bound required by L-smoothness analyses. Standard non-convex SGD theorems impose η ≤ O(1/L) (or similar) on the effective stepsize; when max(t_i)/min(t_i) is unbounded, the largest rescaled η_i can exceed this bound even if the nominal η is set for the fastest worker. This would invalidate the claim that staleness and heterogeneity effects are confined to lower-order terms, as the leading-term complexity relies on the stepsize condition holding uniformly. Bounded heterogeneity addresses data distributions but does not constrain computation-time ratios. The analysis section should explicitly state how the global stepsize is chosen or add a bounded-ratio assumption on t_i.
- [Model and theorem statement] The fixed-computation model is invoked for the complexity result but is not defined in the provided abstract or high-level description. The proof that Rescaled ASGD converges to the correct (unweighted) global objective rather than the frequency-weighted one depends on the precise modeling of updates and delays in this setting; without the definition and the exact error terms, the support for the central claim cannot be verified.
minor comments (2)
- [Abstract] The abstract introduces the 'fixed-computation model' without a one-sentence definition; adding this would improve accessibility for readers.
- [Introduction/Notation] Notation for per-worker stepsizes η_i and times t_i should be introduced with a brief equation or table in the main text to avoid ambiguity when discussing the rescaling.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract and convergence analysis] The rescaling η_i ∝ t_i (stated in the abstract) may violate the uniform stepsize bound required by L-smoothness analyses. Standard non-convex SGD theorems impose η ≤ O(1/L) (or similar) on the effective stepsize; when max(t_i)/min(t_i) is unbounded, the largest rescaled η_i can exceed this bound even if the nominal η is set for the fastest worker. This would invalidate the claim that staleness and heterogeneity effects are confined to lower-order terms, as the leading-term complexity relies on the stepsize condition holding uniformly. Bounded heterogeneity addresses data distributions but does not constrain computation-time ratios. The analysis section should explicitly state how the global stepsize is chosen or add a bounded-ratio assumption on t_i.
Authors: We agree that the rescaling must be accompanied by an explicit global stepsize choice to maintain the uniform bound required by L-smoothness. In the revised manuscript we will state in Section 4 that the global stepsize is set to η = Θ(1/(L ⋅ max_i t_i)), ensuring every worker-specific stepsize η_i = η ⋅ (t_i / t̄) satisfies η_i ≤ O(1/L). Under this choice the leading term of the time complexity remains optimal up to constants that depend on the maximum computation time (inherent to any fixed-computation model), while staleness and heterogeneity remain lower-order. No additional bounded-ratio assumption on the t_i is required. revision: yes
-
Referee: [Model and theorem statement] The fixed-computation model is invoked for the complexity result but is not defined in the provided abstract or high-level description. The proof that Rescaled ASGD converges to the correct (unweighted) global objective rather than the frequency-weighted one depends on the precise modeling of updates and delays in this setting; without the definition and the exact error terms, the support for the central claim cannot be verified.
Authors: The fixed-computation model is formally defined in Section 3, where each worker i is assigned a deterministic computation time t_i per gradient and updates arrive asynchronously with delays bounded by the t_i values. To address the concern we will add a concise definition to the revised abstract and introduction: “In the fixed-computation model each worker i requires a fixed time t_i to compute a gradient, producing asynchronous updates whose delays are proportional to t_i.” The theorem statements will explicitly reference this model, and the main text will highlight the staleness error terms (full derivations remain in the appendix). revision: yes
Circularity Check
Rescaling is a direct design choice to equalize aggregate rates; convergence proof remains independent
full rationale
The paper defines Rescaled ASGD explicitly by setting per-worker stepsizes proportional to computation times so each contributes the same aggregate learning rate over a cycle, thereby targeting the global objective instead of a frequency-weighted one. This is presented as a motivated design fix rather than a derived prediction or fitted parameter. The subsequent non-convex convergence analysis under smoothness and bounded heterogeneity then shows the method reaches stationary points of the correct objective with leading-term time complexity matching the lower bound. No load-bearing self-citations, uniqueness theorems, or reductions by construction appear in the provided text; the central claim rests on standard analysis rather than tautology. This is a minor definitional element (score 2) with no significant circularity.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption smoothness assumption
- domain assumption bounded heterogeneity
Reference graph
Works this paper leans on
-
[1]
IEEE Signal Processing Magazine , author =
Federated. IEEE Signal Processing Magazine , author =. 2020 , note =. doi:10.1109/MSP.2020.2975749 , abstract =
-
[2]
Hsu, Tzu-Ming Harry and Qi, Hang and Brown, Matthew , month = sep, year =. Measuring the. doi:10.48550/arXiv.1909.06335 , abstract =
-
[3]
Speeding. arXiv.org , author =. 2015 , file =. doi:10.1109/TIT.2017.2736066 , abstract =
-
[4]
Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou
Caldas, Sebastian and Duddu, Sai Meher Karthik and Wu, Peter and Li, Tian and Konečný, Jakub and McMahan, H. Brendan and Smith, Virginia and Talwalkar, Ameet , month = dec, year =. doi:10.48550/arXiv.1812.01097 , abstract =
-
[5]
arXiv.org , author =
Practical. arXiv.org , author =. 2012 , file =
2012
-
[6]
Cauchy activation function and
Li, Xin and Xia, Zhihong and Zhang, Hongkun , month = jan, year =. Cauchy activation function and. doi:10.48550/arXiv.2409.19221 , abstract =
-
[7]
In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , month = mar, year =. On the. Proceedings of the 2021. doi:10.1145/3442188.3445922 , abstract =
-
[8]
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
Xu, Peng and Roosta-Khorasani, Farbod and Mahoney, Michael W. , month = feb, year =. Second-. doi:10.48550/arXiv.1708.07827 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1708.07827
-
[9]
Khaled, Ahmed and Jin, Chi , month = mar, year =. Tuning-. doi:10.48550/arXiv.2402.07793 , abstract =
-
[10]
Towards stability and optimality in stochastic gradient descent
Toulis, Panos and Tran, Dustin and Airoldi, Edoardo M. , month = jun, year =. Towards stability and optimality in stochastic gradient descent , url =. doi:10.48550/arXiv.1505.02417 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1505.02417
-
[11]
EMNIST: an extension of MNIST to handwritten letters
Cohen, Gregory and Afshar, Saeed and Tapson, Jonathan and Schaik, André van , month = mar, year =. doi:10.48550/arXiv.1702.05373 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1702.05373
-
[12]
doi:10.48550/arXiv.2202.11599 , abstract =
Zhao, Shipu and Frangella, Zachary and Udell, Madeleine , month = jul, year =. doi:10.48550/arXiv.2202.11599 , abstract =
-
[13]
and Schneider, Frank and Hennig, Philipp , month = aug, year =
Schmidt, Robin M. and Schneider, Frank and Hennig, Philipp , month = aug, year =. Descending through a. doi:10.48550/arXiv.2007.01547 , abstract =
-
[14]
Sivaprasad, Prabhu Teja and Mai, Florian and Vogels, Thijs and Jaggi, Martin and Fleuret, François , month = aug, year =. Optimizer. doi:10.48550/arXiv.1910.11758 , abstract =
-
[15]
Google for Developers , file =
Deep. Google for Developers , file =
-
[16]
Statistical
Demsˇar, Janez and Demsar, Janez , file =. Statistical
-
[17]
Incremental
Wang, Xiaolu and Sun, Yuchang and Wai, Hoi To and Zhang, Jun , month = oct, year =. Incremental
-
[18]
Proceedings of the 36th
Gower, Robert Mansel and Loizou, Nicolas and Qian, Xun and Sailanbayev, Alibek and Shulgin, Egor and Richtárik, Peter , month = may, year =. Proceedings of the 36th
-
[19]
Wang, Xiaolu and Sun, Yuchang and Wai, Hoi-To and Zhang, Jun , month = may, year =. Dual-. doi:10.48550/arXiv.2405.16966 , abstract =
-
[20]
Visualizing the Loss Landscape of Neural Nets
Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom , month = nov, year =. Visualizing the. doi:10.48550/arXiv.1712.09913 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1712.09913
-
[21]
Jain, Sarthak and Wallace, Byron C. , month = may, year =. Attention is not. doi:10.48550/arXiv.1902.10186 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.10186 1902
-
[22]
Understanding deep learning requires rethinking generalization
Zhang, Chiyuan and Bengio, Samy and Hardt, Moritz and Recht, Benjamin and Vinyals, Oriol , month = feb, year =. Understanding deep learning requires rethinking generalization , url =. doi:10.48550/arXiv.1611.03530 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.03530
-
[23]
Henderson, Peter and Islam, Riashat and Bachman, Philip and Pineau, Joelle and Precup, Doina and Meger, David , month = jan, year =. Deep. doi:10.48550/arXiv.1709.06560 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1709.06560
-
[24]
Bubeck, Sébastien , month = nov, year =. Convex. doi:10.48550/arXiv.1405.4980 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1405.4980
-
[25]
Mishchenko, Konstantin and Iutzeler, Franck and Malick, Jérôme and Amini, Massih-Reza , month = jul, year =. A. Proceedings of the 35th
-
[26]
Mishchenko, Konstantin and Khaled, Ahmed and Richtárik, Peter , month = apr, year =. Random. doi:10.48550/arXiv.2006.05988 , abstract =
-
[27]
KAN: Kolmogorov-Arnold Networks
Liu, Ziming and Wang, Yixuan and Vaidya, Sachin and Ruehle, Fabian and Halverson, James and Soljačić, Marin and Hou, Thomas Y. and Tegmark, Max , month = feb, year =. doi:10.48550/arXiv.2404.19756 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.19756
-
[28]
Muon and a
Cesista, Franz Louis , month = apr, year =. Muon and a
-
[29]
Takezawa, Yuki and Koloskova, Anastasia and Jiang, Xiaowen and Stich, Sebastian U. , month = sep, year =. doi:10.48550/arXiv.2509.26337 , abstract =
-
[30]
Bernstein, Jeremy and Newhouse, Laker , month = dec, year =. Old. doi:10.48550/arXiv.2409.20325 , abstract =
-
[31]
doi:10.48550/arXiv.2301.11913 , abstract =
Ryabinin, Max and Dettmers, Tim and Diskin, Michael and Borzunov, Alexander , month = jun, year =. doi:10.48550/arXiv.2301.11913 , abstract =
-
[32]
Benchmarking optimizers for large language model pretraining.arXiv preprint arXiv:2509.01440, 2025
Semenov, Andrei and Pagliardini, Matteo and Jaggi, Martin , month = sep, year =. Benchmarking. doi:10.48550/arXiv.2509.01440 , abstract =
-
[33]
Asgo: Adaptive structured gradient optimization.arXiv preprint arXiv:2503.20762, 2025
An, Kang and Liu, Yuxing and Pan, Rui and Ren, Yi and Ma, Shiqian and Goldfarb, Donald and Zhang, Tong , month = jun, year =. doi:10.48550/arXiv.2503.20762 , abstract =
-
[34]
Khirirat, Sarit and Sadiev, Abdurakhmon and Riabinin, Artem and Gorbunov, Eduard and Richtárik, Peter , month = oct, year =. Error. doi:10.48550/arXiv.2410.16871 , abstract =
-
[35]
Learning representations by back-propagating errors , language =
Rumelhart, David E and Hintont, Geoffrey E and Williams, Ronald J , year =. Learning representations by back-propagating errors , language =
-
[36]
Seide, Frank and Fu, Hao and Droppo, Jasha and Li, Gang and Yu, Dong , month = sep, year =. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech. Interspeech 2014 , publisher =. doi:10.21437/Interspeech.2014-274 , abstract =
-
[37]
Journal of Optimization Theory and Applications , author =
Dualize,. Journal of Optimization Theory and Applications , author =. 2022 , note =. doi:10.1007/s10957-022-02061-8 , abstract =
-
[38]
Kovalev, Dmitry and Beznosikov, Aleksandr and Borodich, Ekaterina and Gasnikov, Alexander and Scutari, Gesualdo , month = may, year =. Optimal. doi:10.48550/arXiv.2205.15136 , abstract =
-
[39]
Accelerated gradient sliding for structured convex optimization
Lan, Guanghui and Ouyang, Yuyuan , month = sep, year =. Accelerated gradient sliding for structured convex optimization , url =. doi:10.48550/arXiv.1609.04905 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1609.04905
-
[40]
Lan, Guanghui and Zhou, Yi , month = jun, year =. Conditional. SIAM Journal on Optimization , publisher =. doi:10.1137/140992382 , abstract =
-
[41]
Ouyang, Yuyuan and Squires, Trevor , month = mar, year =. Universal. doi:10.48550/arXiv.2103.11026 , abstract =
-
[42]
Gradient Sliding for Composite Optimization
Lan, Guanghui , month = jun, year =. Gradient. doi:10.48550/arXiv.1406.0919 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1406.0919
-
[43]
doi:10.48550/arXiv.2302.09832 , abstract =
Condat, Laurent and Agarský, Ivan and Malinovsky, Grigory and Richtárik, Peter , month = apr, year =. doi:10.48550/arXiv.2302.09832 , abstract =
-
[44]
Grudzień, Michał and Malinovsky, Grigory and Richtárik, Peter , month = jan, year =. Can 5th. doi:10.48550/arXiv.2212.14370 , abstract =
-
[45]
Condat, Laurent and Agarský, Ivan and Richtárik, Peter , month = feb, year =. Provably. doi:10.48550/arXiv.2210.13277 , abstract =
-
[46]
doi:10.48550/arXiv.2207.12891 , abstract =
Condat, Laurent and Richtárik, Peter , month = mar, year =. doi:10.48550/arXiv.2207.12891 , abstract =
-
[47]
Sadiev, Abdurakhmon and Kovalev, Dmitry and Richtárik, Peter , month = jul, year =. Communication. doi:10.48550/arXiv.2207.03957 , abstract =
-
[48]
and Hassani, Hamed , month = aug, year =
Mitra, Aritra and Jaafar, Rayana and Pappas, George J. and Hassani, Hamed , month = aug, year =. Linear. doi:10.48550/arXiv.2102.07053 , abstract =
-
[49]
Gorbunov, Eduard and Hanzely, Filip and Richtárik, Peter , month = nov, year =. Local. doi:10.48550/arXiv.2011.02828 , abstract =
-
[50]
Karimireddy, Sai Praneeth and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank J. and Stich, Sebastian U. and Suresh, Ananda Theertha , month = apr, year =. doi:10.48550/arXiv.1910.06378 , abstract =
-
[51]
Khaled, Ahmed and Mishchenko, Konstantin and Richtárik, Peter , month = apr, year =. Tighter. doi:10.48550/arXiv.1909.04746 , abstract =
-
[52]
Khaled, Ahmed and Mishchenko, Konstantin and Richtárik, Peter , month = mar, year =. First. doi:10.48550/arXiv.1909.04715 , abstract =
-
[53]
Haddadpour, Farzin and Mahdavi, Mehrdad , month = dec, year =. On the. doi:10.48550/arXiv.1910.14425 , abstract =
-
[54]
Li, Xiang and Huang, Kaixuan and Yang, Wenhao and Wang, Shusen and Zhang, Zhihua , month = jun, year =. On the. doi:10.48550/arXiv.1907.02189 , abstract =
-
[55]
SparkNet: Training Deep Networks in Spark
Moritz, Philipp and Nishihara, Robert and Stoica, Ion and Jordan, Michael I. , month = feb, year =. doi:10.48550/arXiv.1511.06051 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1511.06051
-
[56]
Parallel training of DNNs with Natural Gradient and Parameter Averaging
Povey, Daniel and Zhang, Xiaohui and Khudanpur, Sanjeev , month = jun, year =. Parallel training of. doi:10.48550/arXiv.1410.7455 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1410.7455
-
[57]
Practical Secure Aggregation for Federated Learning on User-Held Data
Bonawitz, Keith and Ivanov, Vladimir and Kreuter, Ben and Marcedone, Antonio and McMahan, H. Brendan and Patel, Sarvar and Ramage, Daniel and Segal, Aaron and Seth, Karn , month = nov, year =. Practical. doi:10.48550/arXiv.1611.04482 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1611.04482
-
[58]
Training deep learning models with norm-constrained lmos.arXiv preprint arXiv:2502.07529, 2025
Pethick, Thomas and Xie, Wanyun and Antonakopoulos, Kimon and Zhu, Zhenyu and Silveti-Falls, Antonio and Cevher, Volkan , month = jun, year =. Training. doi:10.48550/arXiv.2502.07529 , abstract =
-
[59]
Riabinin, Artem and Shulgin, Egor and Gruntkowska, Kaja and Richtárik, Peter , month = may, year =. Gluon:. doi:10.48550/arXiv.2505.13416 , abstract =
-
[60]
Gruntkowska, Kaja and Li, Hanmin and Rane, Aadi and Richtárik, Peter , month = jul, year =. The. doi:10.48550/arXiv.2502.02002 , abstract =
-
[61]
Konečný, Jakub and Richtárik, Peter , month = nov, year =. Simple. doi:10.48550/arXiv.1410.0390 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1410.0390
-
[62]
Beck, Amir , month = oct, year =. Introduction to. doi:10.1137/1.9781611973655 , language =
-
[63]
Dekel, Ofer and Gilad-Bachrach, Ran and Shamir, Ohad and Xiao, Lin , month = jan, year =. Optimal. doi:10.48550/arXiv.1012.1367 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1012.1367
-
[64]
IEEE Transactions on Computers , author =
Falcon:. IEEE Transactions on Computers , author =. 2021 , keywords =. doi:10.1109/TC.2020.2974461 , abstract =
-
[65]
Basu, Saurav and Saxena, Vaibhav and Panja, Rintu and Verma, Ashish , month = dec, year =. Balancing. 2018. doi:10.1109/HiPC.2018.00011 , abstract =
-
[66]
Ferdinand, Nuwan and Gharachorloo, Benjamin and Draper, Stark C. , month = dec, year =. Anytime. 2017 16th. doi:10.1109/ICMLA.2017.0-166 , abstract =
-
[67]
Staleness-aware Async-SGD for Distributed Deep Learning
Zhang, Wei and Gupta, Suyog and Lian, Xiangru and Liu, Ji , month = apr, year =. Staleness-aware. doi:10.48550/arXiv.1511.05950 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1511.05950
-
[68]
Gupta, Suyog and Zhang, Wei and Wang, Fei , month = dec, year =. Model. doi:10.48550/arXiv.1509.04210 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1509.04210
-
[69]
Robust and Communication-Efficient Federated Learning from Non-IID Data
Sattler, Felix and Wiedemann, Simon and Müller, Klaus-Robert and Samek, Wojciech , month = mar, year =. Robust and. doi:10.48550/arXiv.1903.02891 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1903.02891 1903
-
[70]
Federated Learning for Mobile Keyboard Prediction
Hard, Andrew and Rao, Kanishka and Mathews, Rajiv and Ramaswamy, Swaroop and Beaufays, Françoise and Augenstein, Sean and Eichner, Hubert and Kiddon, Chloé and Ramage, Daniel , month = feb, year =. Federated. doi:10.48550/arXiv.1811.03604 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1811.03604
-
[71]
Asynchronous
Cohen, Alon and Daniely, Amit and Drori, Yoel and Koren, Tomer and Schain, Mariano , year =. Asynchronous. Advances in
-
[72]
Tang, Zhenheng and Shi, Shaohuai and Wang, Wei and Li, Bo and Chu, Xiaowen , month = sep, year =. Communication-. doi:10.48550/arXiv.2003.06307 , abstract =
-
[73]
Liang, Feng and Zhang, Zhen and Lu, Haifeng and Leung, Victor C. M. and Guo, Yanyi and Hu, Xiping , month = apr, year =. Communication-. doi:10.48550/arXiv.2404.06114 , abstract =
-
[74]
Communications of the ACM , author =. 2008 , pages =. doi:10.1145/1327452.1327492 , abstract =
-
[75]
Dean, Jeffrey and Corrado, Greg and Monga, Rajat and Chen, Kai and Devin, Matthieu and Mao, Mark and Ranzato, Marc' aurelio and Senior, Andrew and Tucker, Paul and Yang, Ke and Le, Quoc and Ng, Andrew , year =. Large. Advances in
-
[76]
Zakerinia, Hossein and Talaei, Shayan and Nadiradze, Giorgi and Alistarh, Dan , month = may, year =. Communication-. doi:10.48550/arXiv.2206.10032 , abstract =
-
[77]
Proceedings of the AAAI Conference on Artificial Intelligence , author =
Efficient. Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , keywords =. doi:10.1609/aaai.v38i15.29603 , abstract =
-
[78]
Shi, Jianwei and Abdulah, Sameh and Sun, Ying and Genton, Marc G. , month = oct, year =. Scalable. doi:10.48550/arXiv.2510.01771 , abstract =
-
[79]
Hogwild!:
Recht, Benjamin and Re, Christopher and Wright, Stephen and Niu, Feng , year =. Hogwild!:. Advances in
-
[80]
Gruntkowska, Kaja and Gaponov, Alexander and Tovmasyan, Zhirayr and Richtárik, Peter , month = oct, year =. Error. doi:10.48550/arXiv.2510.00643 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.