pith:XR4JGHSU
Exceeding the Numerical and Performance Characteristics of IEEE-754 SGEMM with BFloat16 Tensor Cores on GPUs for Scientific Computing
Using BFloat16 tensor cores with FP32 accumulation on GPUs exceeds the speed and numerical accuracy of native IEEE-754 FP32 SGEMM for scientific workloads.
arxiv:2605.16617 v1 · 2026-05-15 · cs.DC
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XR4JGHSU2QQVVMICQ52A3RJJOM}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
This paper examines the performance, efficiency, power, and numerical characteristics of FP32 matrix multiplication via BF16-based emulation and demonstrates how it exceeds numerical and performance characteristics of native FP32 for scientific applications.
The assumption that BF16 tensor core operations accumulated into FP32 accumulators, combined with Blackwell-specific scaling hardware, produce results that are both faster and numerically superior to native IEEE-754 FP32 SGEMM across relevant scientific workloads without hidden accuracy losses from rounding or denormal handling.
BF16 tensor cores on GPUs emulate FP32 SGEMM with superior performance, power efficiency, and numerical accuracy compared to native FP32, including a library implementation that handles denormals.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:02:32.740281Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
bc78931e54d4215ab10287740dc529733a66a40194bdffc0efe3d6398b634e7c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XR4JGHSU2QQVVMICQ52A3RJJOM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: bc78931e54d4215ab10287740dc529733a66a40194bdffc0efe3d6398b634e7c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "7202c08d7290dacfc13424c7f0b728bb8c095152f517a436d62c3b8c1ceae93d",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.DC",
"submitted_at": "2026-05-15T20:37:49Z",
"title_canon_sha256": "55ff7533e0d1ea6e01754bff217154a5b4c9af3e35c4186304d5500b3b07d9bd"
},
"schema_version": "1.0",
"source": {
"id": "2605.16617",
"kind": "arxiv",
"version": 1
}
}