Pith Number

pith:7NH7VC5I

pith:2025:7NH7VC5IJVQDP67TU5GSEZPLKV

not attested not anchored not stored refs resolved

SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication

Benjamin Brock, Chen Zhuang, Du Wu, Lingqi Zhang, Mohamed Wahib, Peng Chen, Satoshi Matsuoka, Toshio Endo

SHIRO reduces distributed SpMM communication overhead by sending only the data needed for non-zero multiplications and prioritizing fast intra-node GPU links.

arxiv:2512.20178 v2 · 2025-12-23 · cs.DC · cs.PF

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{7NH7VC5IJVQDP67TU5GSEZPLKV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SHIRO demonstrates strong scalability up to 128 GPUs, achieving geometric mean speedups of 221.5×, 56.0×, 23.4×, and 8.8× in SpMM over four state-of-the-art baselines (CAGNET, SPA, BCL, and CoLa, respectively) at this scale.

C2weakest assumption

The sparsity patterns present in the evaluated real-world datasets allow the fine-grained strategy to eliminate most redundant transfers, and the target hardware uses a two-tier GPU network where intra-node links are substantially faster than inter-node links.

C3one line summary

SHIRO achieves geometric mean speedups of 221.5x to 8.8x over four baselines in distributed SpMM on up to 128 GPUs by exploiting sparsity patterns and two-tier network topologies.

References

49 extracted · 49 resolved · 2 Pith anchors

[1] All-pairs shortest paths computation in the bsp model, 2001

[2] Rdma-based algorithms for sparse matrix multiplication on gpus, 2024

[3] The block conjugate gradient algorithm and related methods, 1980

[4] A shifted block lanczos algorithm for solving sparse symmetric generalized eigenproblems, 1994

[5] A block arnoldi-chebyshev method for computing the leading eigenpairs of large sparse unsymmetric matrices, 1993

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-18T03:09:32.471914Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

fb4ffa8ba84d6037fbf3a74d2265eb557d6e7c2dda2d47a9d7ba4cda05685854

Aliases

arxiv: 2512.20178 · arxiv_version: 2512.20178v2 · doi: 10.48550/arxiv.2512.20178 · pith_short_12: 7NH7VC5IJVQD · pith_short_16: 7NH7VC5IJVQDP67T · pith_short_8: 7NH7VC5I

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7NH7VC5IJVQDP67TU5GSEZPLKV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fb4ffa8ba84d6037fbf3a74d2265eb557d6e7c2dda2d47a9d7ba4cda05685854

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "bdeca8ee6a43454ff0a015f230839c0ae3c281bc08782a7aae5bfc65bfe53338",
    "cross_cats_sorted": [
      "cs.PF"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.DC",
    "submitted_at": "2025-12-23T09:16:52Z",
    "title_canon_sha256": "7b507e33d65dafa141601d7b49b33bf4cd00a1a9f0f98411add37bae730ce762"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.20178",
    "kind": "arxiv",
    "version": 2
  }
}