arXiv preprint arXiv:2103.01447 , year=

· 2021 · arXiv 2103.01447

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums

cs.LG · 2026-06-14 · unverdicted · novelty 7.0

SILAGE is a variance-reduced algorithm for nested finite-sum nonconvex optimization that uses O(n) memory, evaluates at most one local group gradient per iteration, and adapts convergence to data heterogeneity parameters δ1 and δ2.

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

LOSCAR-SGD combines local updates, sparse model averaging, and communication-computation overlap with a delay-corrected merge rule, providing convergence rates for smooth non-convex objectives under worker heterogeneity.

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Ringmaster LMO extends delay-thresholding from ASGD to LMO-based momentum updates, providing convergence guarantees under (L0, L1)-smoothness and time-complexity bounds that recover optimal rates in the Euclidean case.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits

math.OC · 2026-05-08 · unverdicted · novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and bounded heterogeneity.

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

math.OC · 2026-05-09 · unverdicted · novelty 5.0

Rennala MVR improves time complexity over Rennala SGD for smooth nonconvex stochastic optimization in heterogeneous parallel systems under a mean-squared smoothness assumption.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits math.OC · 2026-05-08 · unverdicted · none · ref 85
Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.

arXiv preprint arXiv:2103.01447 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer