pith. sign in
Pith Number

pith:KZGL6CQJ

pith:2022:KZGL6CQJWFLXZEDBB2SHHCPXD5
not attested not anchored not stored refs resolved

FP8 Formats for Deep Learning

Alexander Heinecke, Dusan Stosic, Hao Wu, John Kamalu, Marius Cornea, Michael Siu, Mohammad Shoeybi, Naveen Mellempudi, Neil Burgess, Patrick Judd, Paulius Micikevicius, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Stuart Oberman

FP8 with E4M3 and E5M2 encodings matches 16-bit training accuracy on large language and image models without hyperparameter changes.

arxiv:2209.05433 v2 · 2022-09-12 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KZGL6CQJWFLXZEDBB2SHHCPXD5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We demonstrate the efficacy of the FP8 format on a variety of image and language tasks, effectively matching the result quality achieved by 16-bit training sessions. Our study covers the main modern neural network architectures - CNNs, RNNs, and Transformer-based models, leaving all the hyperparameters unchanged from the 16-bit baseline training sessions. Our training experiments include large, up to 175B parameter, language models.

C2weakest assumption

That the chosen E4M3 and E5M2 encodings will preserve accuracy across all tasks and model scales without any hyperparameter retuning or task-specific adjustments.

C3one line summary

FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.

References

26 extracted · 26 resolved · 7 Pith anchors

[1] Michael J. Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, 2021
[2] Language models are few-shot learners 1901
[3] Bfloat16 processing for neural networks 2019
[4] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashc 2022
[5] Binaryconnect: Training deep neural networks with binary weights during propagations 2015

Formal links

1 machine-checked theorem link

Cited by

39 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.855387Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

564cbf0a09b1577c90610ea47389f71f657f9815920159d0397f4be07f509302

Aliases

arxiv: 2209.05433 · arxiv_version: 2209.05433v2 · doi: 10.48550/arxiv.2209.05433 · pith_short_12: KZGL6CQJWFLX · pith_short_16: KZGL6CQJWFLXZEDB · pith_short_8: KZGL6CQJ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KZGL6CQJWFLXZEDBB2SHHCPXD5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 564cbf0a09b1577c90610ea47389f71f657f9815920159d0397f4be07f509302
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c520978422b7a453110ed090b9d0c9b9786712d389b5a932b2e5184ded23748b",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2022-09-12T17:39:55Z",
    "title_canon_sha256": "a787abcfcac8027bbbf57bf19b5a9d2496b30a55d097c56c0d787ecb293565af"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2209.05433",
    "kind": "arxiv",
    "version": 2
  }
}