pith. sign in
Pith Number

pith:WVMNY27O

pith:2020:WVMNY27ORQRCGY3BXFUU6EKCWV
not attested not anchored not stored refs resolved

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

Adam Paszke, Brian Vaughan, Jeff Smith, Omkar Salpekar, Pieter Noordhuis, Pritam Damania, Rohan Varma, Shen Li, Soumith Chintala, Teng Li, Yanli Zhao

PyTorch's distributed data parallel module achieves near-linear scaling to 256 GPUs by overlapping computation with communication.

arxiv:2006.15704 v1 · 2020-06-28 · cs.DC · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WVMNY27ORQRCGY3BXFUU6EKCWV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.

C2weakest assumption

The assumption that typical deep learning models have enough computation per layer to effectively overlap with gradient communication and that the underlying network fabric supports low-latency all-reduce operations at the tested scale.

C3one line summary

PyTorch distributed data parallel attains near-linear scalability on 256 GPUs through gradient bucketing, computation-communication overlap, and selective synchronization skipping.

References

48 extracted · 48 resolved · 6 Pith anchors

[1] PyTorch Distributed: Experiences on Accelerating Data Parallel Training 2006 · arXiv:2006.15704
[2] Then, we explain and justify the idea of data parallelism and describe communication primitives
[3] During distributed training, each pro- cess has its own local model replica and local optimizer
[4] This section focus on the current status as of PyTorch v1.5.0
[5] In the exclusive cluster, the GPUs are located on 4 servers, connected using Mellanox MT27700 ConnectX-4 100GB/s NIC

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.319742Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b558dc6bee8c22236361b9694f1142b57431b7e931c3b1b02fcbf29e0e94bffe

Aliases

arxiv: 2006.15704 · arxiv_version: 2006.15704v1 · doi: 10.48550/arxiv.2006.15704 · pith_short_12: WVMNY27ORQRC · pith_short_16: WVMNY27ORQRCGY3B · pith_short_8: WVMNY27O
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WVMNY27ORQRCGY3BXFUU6EKCWV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b558dc6bee8c22236361b9694f1142b57431b7e931c3b1b02fcbf29e0e94bffe
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "bbd501c503bf854713f3294e6d4239118a5d4d86df4399aa29fe93a2e5f475e5",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.DC",
    "submitted_at": "2020-06-28T20:39:45Z",
    "title_canon_sha256": "56fb8a04d11c1bb768f2d45e28e499a3d58de29e597e84838d5d5da921881e1f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2006.15704",
    "kind": "arxiv",
    "version": 1
  }
}