Ozaki-Bailey 3D FFT achieves near memory-roof FP64 performance on B300 by emulating via FP8 tensor cores with Garner reconstruction split into phases and Kulisch escape on INT32 units.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FP8 is All You Need (Part 2): Efficient Ozaki-Bailey Style FFT Through Tensor-core Garner Reformulation and Kulisch Escape Route
Ozaki-Bailey 3D FFT achieves near memory-roof FP64 performance on B300 by emulating via FP8 tensor cores with Garner reconstruction split into phases and Kulisch escape on INT32 units.