pith. sign in
Pith Number

pith:NHFUK5FQ

pith:2026:NHFUK5FQZ6ROWOJKM3CAN5SYFW
not attested not anchored not stored refs resolved

BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

Bingyi Jing, Jiaxing Zhang, Jingyi Xi, Junjie Zhang, Songxin Zhang, Ting Sun, Xiao Yan, Zejian Xie, Zhuoyang Song, Zunyao Mao

BatchWeave builds a consistent object-store-native data plane that delivers atomic all-rank batch visibility and exactly-once recovery for distributed foundation model training.

arxiv:2605.09994 v2 · 2026-05-11 · cs.DC · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NHFUK5FQZ6ROWOJKM3CAN5SYFW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluations on large-scale multimodal pre-training and SFT workloads using 64 GPUs show that BatchWeave outperforms colocated dataloader throughput while providing full failure isolation, outperforms Apache Kafka in ingestion throughput, and achieves lower consumer read latency than Kafka.

C2weakest assumption

Object stores can deliver the versioned-manifest ACID semantics and conditional-write performance needed for atomic all-rank batch visibility and checkpoint-aligned lifecycle management without introducing latency or throughput penalties that would erase the reported gains over colocated and Kafka baselines.

C3one line summary

BatchWeave delivers an object-store-native data plane for distributed large foundation model training via transactional global batches and a decentralized adaptive commit algorithm.

References

42 extracted · 42 resolved · 5 Pith anchors

[1] Alex Aizman, Gavin Maltby, and Thomas Breuel. 2020. High Performance I/O For Large Scale Deep Learning. https://arxiv.org/abs/2001.01858. doi:10.48550/ arXiv.2001.01858 arXiv:2001.01858 2020
[2] Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J Dagum, Sam Knight, Frances Perry, Reiner Schmidt, and Sam Whittle. 2015. The dataflow model: a practical approach to balancing c 2015
[3] Michael Armbrust, Tathagata Das, Liwen Sun, Burak Yavuz, Shixiong Zhu, Mukul Murthy, Joseph Torres, Herman van Hovell, Adrian Ionescu, Bogdan Ghit, Mad- hukara Bhat, Reynold Xin, Ali Ghodsi, Ion Stoic 2020
[4] Michael Armbrust, Tathagata Das, Joseph Torres, Burak Yavuz, Shixiong Liao, Yin Huai, Hossein Hosseini, Matei Zaharia, and Reynold Xin. 2018. Structured streaming: A declarative api for real-time appl 2018
[5] Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, and Chan- dramohan A. Thekkath. 2023. tf.data service: A Case for Disaggregating ML Input Data Processing. InProceedings of the 2023 AC 2023

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:00:42.205104Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

69cb4574b0cfa2eb392a66c406f6582dae2021bb86f2c6ea7f42c89450822884

Aliases

arxiv: 2605.09994 · arxiv_version: 2605.09994v2 · doi: 10.48550/arxiv.2605.09994 · pith_short_12: NHFUK5FQZ6RO · pith_short_16: NHFUK5FQZ6ROWOJK · pith_short_8: NHFUK5FQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NHFUK5FQZ6ROWOJKM3CAN5SYFW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 69cb4574b0cfa2eb392a66c406f6582dae2021bb86f2c6ea7f42c89450822884
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "6d2f71694bc4c4a94be716719986f5e3e1078dae4a4a464e5773ee95df9c9284",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.DC",
    "submitted_at": "2026-05-11T05:10:16Z",
    "title_canon_sha256": "95e337d1c75af465b6532b3b6aeda0777402c2554a5fe9e9aa6d6c4f1ba36531"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.09994",
    "kind": "arxiv",
    "version": 2
  }
}