Pith Number

pith:KZGL6CQJ

pith:2022:KZGL6CQJWFLXZEDBB2SHHCPXD5

not attested not anchored not stored refs resolved

FP8 Formats for Deep Learning

Alexander Heinecke, Dusan Stosic, Hao Wu, John Kamalu, Marius Cornea, Michael Siu, Mohammad Shoeybi, Naveen Mellempudi, Neil Burgess, Patrick Judd, Paulius Micikevicius, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Stuart Oberman

FP8 with E4M3 and E5M2 encodings matches 16-bit training accuracy on large language and image models without hyperparameter changes.

arxiv:2209.05433 v2 · 2022-09-12 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{KZGL6CQJWFLXZEDBB2SHHCPXD5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We demonstrate the efficacy of the FP8 format on a variety of image and language tasks, effectively matching the result quality achieved by 16-bit training sessions. Our study covers the main modern neural network architectures - CNNs, RNNs, and Transformer-based models, leaving all the hyperparameters unchanged from the 16-bit baseline training sessions. Our training experiments include large, up to 175B parameter, language models.

C2weakest assumption

That the chosen E4M3 and E5M2 encodings will preserve accuracy across all tasks and model scales without any hyperparameter retuning or task-specific adjustments.

C3one line summary

FP8 formats E4M3 and E5M2 match 16-bit training accuracy on CNNs, RNNs, and Transformers up to 175B parameters without hyperparameter changes.

References

26 extracted · 26 resolved · 7 Pith anchors

[1] Michael J. Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, 2021

[2] Language models are few-shot learners 1901

[3] Bﬂoat16 processing for neural networks 2019

[4] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashc 2022

[5] Binaryconnect: Training deep neural networks with binary weights during propagations 2015

Formal links

1 machine-checked theorem link

Cited by

39 papers in Pith

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models

NVILA: Efficient Frontier Visual Language Models

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

Receipt and verification

First computed	2026-05-17T23:38:52.855387Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

564cbf0a09b1577c90610ea47389f71f657f9815920159d0397f4be07f509302

Aliases

arxiv: 2209.05433 · arxiv_version: 2209.05433v2 · doi: 10.48550/arxiv.2209.05433 · pith_short_12: KZGL6CQJWFLX · pith_short_16: KZGL6CQJWFLXZEDB · pith_short_8: KZGL6CQJ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/KZGL6CQJWFLXZEDBB2SHHCPXD5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 564cbf0a09b1577c90610ea47389f71f657f9815920159d0397f4be07f509302

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "c520978422b7a453110ed090b9d0c9b9786712d389b5a932b2e5184ded23748b",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2022-09-12T17:39:55Z",
    "title_canon_sha256": "a787abcfcac8027bbbf57bf19b5a9d2496b30a55d097c56c0d787ecb293565af"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2209.05433",
    "kind": "arxiv",
    "version": 2
  }
}