pith:NHFUK5FQ
BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training
BatchWeave builds a consistent object-store-native data plane that delivers atomic all-rank batch visibility and exactly-once recovery for distributed foundation model training.
arxiv:2605.09994 v2 · 2026-05-11 · cs.DC · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NHFUK5FQZ6ROWOJKM3CAN5SYFW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Evaluations on large-scale multimodal pre-training and SFT workloads using 64 GPUs show that BatchWeave outperforms colocated dataloader throughput while providing full failure isolation, outperforms Apache Kafka in ingestion throughput, and achieves lower consumer read latency than Kafka.
Object stores can deliver the versioned-manifest ACID semantics and conditional-write performance needed for atomic all-rank batch visibility and checkpoint-aligned lifecycle management without introducing latency or throughput penalties that would erase the reported gains over colocated and Kafka baselines.
BatchWeave delivers an object-store-native data plane for distributed large foundation model training via transactional global batches and a decentralized adaptive commit algorithm.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:42.205104Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
69cb4574b0cfa2eb392a66c406f6582dae2021bb86f2c6ea7f42c89450822884
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NHFUK5FQZ6ROWOJKM3CAN5SYFW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 69cb4574b0cfa2eb392a66c406f6582dae2021bb86f2c6ea7f42c89450822884
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "6d2f71694bc4c4a94be716719986f5e3e1078dae4a4a464e5773ee95df9c9284",
"cross_cats_sorted": [
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.DC",
"submitted_at": "2026-05-11T05:10:16Z",
"title_canon_sha256": "95e337d1c75af465b6532b3b6aeda0777402c2554a5fe9e9aa6d6c4f1ba36531"
},
"schema_version": "1.0",
"source": {
"id": "2605.09994",
"kind": "arxiv",
"version": 2
}
}