pith. sign in

ATOMO: Communication-efficient Learning via Atomic Sparsification

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster distributed training.

fields

cs.LG 1

years

2026 1

verdicts

CONDITIONAL 1

clear filters

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • SCAPE: Accurate and Efficient LLM Training with Extreme Sparse Communication cs.LG · 2026-07-02 · conditional · none · ref 45 · internal anchor

    SCAPE enables 90-99% sparse gradient communication in sharded Adam-style LLM training by deriving masks from first-moment statistics, achieving up to 43.3% faster pre-training on Llama-500M with no loss in validation loss or downstream accuracy.