pith:LG76FO4F
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
ZeRO partitions optimizer states and gradients across devices to remove memory redundancy in parallel training.
arxiv:1910.02054 v3 · 2019-10-04 · cs.LG · cs.DC · stat.ML
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LG76FO4FXIRQAKPLNJWYADOHLW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
ZeRO eliminates memory redundancies in data- and model-parallel training while retaining low communication volume and high computational granularity, allowing us to scale the model size proportional to the number of devices with sustained high efficiency. Our analysis demonstrates ZeRO has the potential to scale beyond 1 Trillion parameters using today's hardware.
The assumption that partitioning optimizer states and gradients will not introduce new communication bottlenecks or synchronization overheads that scale worse than linearly when moving to thousands of devices.
ZeRO removes memory redundancies in parallel training to scale deep learning models to over a trillion parameters with high throughput on current hardware.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.364346Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
59bfe2bb85ba230029eb6a6d800dc75da176779950b8cf7ce12fe03970dfb98d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LG76FO4FXIRQAKPLNJWYADOHLW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 59bfe2bb85ba230029eb6a6d800dc75da176779950b8cf7ce12fe03970dfb98d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "769410855d6e6defbf18a87865b61cd2c4373b74c87a93f622ec300280dd1a77",
"cross_cats_sorted": [
"cs.DC",
"stat.ML"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2019-10-04T17:29:39Z",
"title_canon_sha256": "5c51bb8d9d15dc00904edb477c9632c6ae88312b10fbfa1a9d71978551cf7643"
},
"schema_version": "1.0",
"source": {
"id": "1910.02054",
"kind": "arxiv",
"version": 3
}
}