Ozaki-Bailey 3D FFT achieves near memory-roof FP64 performance on B300 by emulating via FP8 tensor cores with Garner reconstruction split into phases and Kulisch escape on INT32 units.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
FP8 with Ozaki II recovers memory-roof FP64 performance on B300 GPUs, exceeding native FP64 throughput by an order of magnitude in compute-bound regimes.
citing papers explorer
-
FP8 is All You Need (Part 2): Efficient Ozaki-Bailey Style FFT Through Tensor-core Garner Reformulation and Kulisch Escape Route
Ozaki-Bailey 3D FFT achieves near memory-roof FP64 performance on B300 by emulating via FP8 tensor cores with Garner reconstruction split into phases and Kulisch escape on INT32 units.
-
FP8 is All You Need (Part 1): Debunking Hardware FP64 as the HPC Holy Grail
FP8 with Ozaki II recovers memory-roof FP64 performance on B300 GPUs, exceeding native FP64 throughput by an order of magnitude in compute-bound regimes.