{"work":{"id":"c7217bc9-e53d-40c5-a437-25a74abc36cd","openalex_id":null,"doi":null,"arxiv_id":"2209.05433","raw_key":null,"title":"FP8 Formats for Deep Learning","authors":null,"authors_text":"Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite","year":2022,"venue":"cs.LG","abstract":"FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for representatio of special values, E4M3's dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs. We demonstrate the efficacy of the FP8 format on a variety of image and language tasks, effectively matching the result quality achieved by 16-bit training sessions. Our study covers the main modern neural network architectures - CNNs, RNNs, and Transformer-based models, leaving all the hyperparameters unchanged from the 16-bit baseline training sessions. Our training experiments include large, up to 175B parameter, language models. We also examine FP8 post-training-quantization of language models trained using 16-bit formats that resisted fixed point int8 quantization.","external_url":"https://arxiv.org/abs/2209.05433","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-07-02T13:36:59.285147+00:00","pith_arxiv_id":"2209.05433","created_at":"2026-05-08T23:49:28.447389+00:00","updated_at":"2026-07-02T13:36:59.285147+00:00","title_quality_ok":false,"display_title":"FP8 Formats for Deep Learning","render_title":"FP8 Formats for Deep Learning"},"hub":{"state":{"work_id":"c7217bc9-e53d-40c5-a437-25a74abc36cd","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":49,"external_cited_by_count":null,"distinct_field_count":10,"first_pith_cited_at":"2024-07-11T15:44:48+00:00","last_pith_cited_at":"2026-07-01T16:13:03+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-07-02T19:32:49.936169+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":9},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":8},{"context_polarity":"unclear","n":2}],"runs":{},"summary":{},"graph":{},"authors":[]}}