pith. sign in

Double-Precision Matrix Multiplication Emulation via Ozaki-II Scheme with FP8 Quantization

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

In this paper, we propose a method for emulating double-precision general matrix--matrix multiplication (DGEMM), a fundamental and performance-critical kernel in many high-performance computing applications. Ozaki-I and Ozaki-II are established DGEMM emulation schemes via low-precision matrix multiply-accumulate (MMA) units. For the Ozaki-I scheme, INT8-, FP8-, and FP16-based implementations have been proposed, all of which can be realized based on the same underlying algorithmic structure. In contrast, although INT8-based implementations of the Ozaki-II scheme have been reported, the original algorithm cannot be directly adapted to exploit FP8 MMA units. In several recent architectures, such as NVIDIA Blackwell Ultra and NVIDIA Rubin, INT8 performance has been reduced, making reliance on INT8 alone insufficient. Therefore, we introduce a novel technique to demonstrate DGEMM emulation based on the Ozaki-II scheme that operates on FP8 MMA units. Compared to the FP8-based Ozaki-I scheme, our method significantly reduces the computational cost and enables efficient FP64 emulation.

fields

cs.MS 2

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Improved Scaling for Fast Mode of Ozaki Scheme II

cs.MS · 2026-06-28 · unverdicted · novelty 6.0

A scale-invariant revision to the fast-mode scaling formula in Ozaki scheme II ensures the CRT uniqueness condition holds for all input scalings while preserving the speed of the original fast mode.

citing papers explorer

Showing 2 of 2 citing papers.