Pith Number

pith:LG76FO4F

pith:2019:LG76FO4FXIRQAKPLNJWYADOHLW

not attested not anchored not stored refs resolved

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Jeff Rasley, Olatunji Ruwase, Samyam Rajbhandari, Yuxiong He

ZeRO partitions optimizer states and gradients across devices to remove memory redundancy in parallel training.

arxiv:1910.02054 v3 · 2019-10-04 · cs.LG · cs.DC · stat.ML

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{LG76FO4FXIRQAKPLNJWYADOHLW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

ZeRO eliminates memory redundancies in data- and model-parallel training while retaining low communication volume and high computational granularity, allowing us to scale the model size proportional to the number of devices with sustained high efficiency. Our analysis demonstrates ZeRO has the potential to scale beyond 1 Trillion parameters using today's hardware.

C2weakest assumption

The assumption that partitioning optimizer states and gradients will not introduce new communication bottlenecks or synchronization overheads that scale worse than linearly when moving to thousands of devices.

C3one line summary

ZeRO removes memory redundancies in parallel training to scale deep learning models to over a trillion parameters with high throughput on current hardware.

References

26 extracted · 26 resolved · 9 Pith anchors

[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 · arXiv:1810.04805

[2] Language models are unsupervised multitask learners 2019

[3] Megatron-lm: Training multi-billion parameter language models using model parallelism 2019

[4] Colin Raﬀel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learn- ing with a uniﬁed text-to-text tran 2019

[5] Nimit Sharad Sohoni, Christopher Richard Aberger, Megan Leszczynski, Jian Zhang, and Christo- pher R´e 2018 · arXiv:1811.02084

Formal links

2 machine-checked theorem links

Cited by

31 papers in Pith

Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project

Hawkeye: Reproducing GPU-Level Non-Determinism

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

Towards Human-Level Book-Writing Capability

Receipt and verification

First computed	2026-05-17T23:38:48.364346Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

59bfe2bb85ba230029eb6a6d800dc75da176779950b8cf7ce12fe03970dfb98d

Aliases

arxiv: 1910.02054 · arxiv_version: 1910.02054v3 · doi: 10.48550/arxiv.1910.02054 · pith_short_12: LG76FO4FXIRQ · pith_short_16: LG76FO4FXIRQAKPL · pith_short_8: LG76FO4F

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/LG76FO4FXIRQAKPLNJWYADOHLW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 59bfe2bb85ba230029eb6a6d800dc75da176779950b8cf7ce12fe03970dfb98d

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "769410855d6e6defbf18a87865b61cd2c4373b74c87a93f622ec300280dd1a77",
    "cross_cats_sorted": [
      "cs.DC",
      "stat.ML"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2019-10-04T17:29:39Z",
    "title_canon_sha256": "5c51bb8d9d15dc00904edb477c9632c6ae88312b10fbfa1a9d71978551cf7643"
  },
  "schema_version": "1.0",
  "source": {
    "id": "1910.02054",
    "kind": "arxiv",
    "version": 3
  }
}