Guaranteed dgemm accuracy while using reduced precision tensor cores through extensions of the ozaki scheme

· 2026 · arXiv 3656.377367

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

EmuGEMM: Fused Tensor Core Kernels for Precision Emulation in Matrix Multiplication

cs.DC · 2026-06-24 · unverdicted · novelty 6.0

Fused Tensor Core kernels for Ozaki Schemes I and II achieve up to 83% of INT8 peak throughput and outperform cuBLAS TF32 and ZGEMM on large matrices at comparable accuracy.

Exceeding the Numerical and Performance Characteristics of IEEE-754 SGEMM with BFloat16 Tensor Cores on GPUs for Scientific Computing

cs.DC · 2026-05-15 · conditional · novelty 6.0

BF16 tensor cores on GPUs emulate FP32 SGEMM with superior performance, power efficiency, and numerical accuracy compared to native FP32, including a library implementation that handles denormals.

Double-Precision Matrix Multiplication Emulation via Ozaki-II Scheme with FP8 Quantization

cs.DC · 2026-03-11 · unverdicted · novelty 6.0

An adaptation of the Ozaki-II scheme allows DGEMM emulation on FP8 MMA units with significantly reduced computational cost compared to FP8-based Ozaki-I.

Optimizing Semiconductor Device Simulations through Low-Precision Arithmetic

cs.CE · 2026-06-24 · unverdicted · novelty 4.0

The quatrex quantum transport solver achieves up to 51% higher throughput using low-precision formats while maintaining accuracy on realistic semiconductor systems.

Post-Moore Technologies for Plasma Simulation: A Community Roadmap

cs.ET · 2026-05-08 · unverdicted · novelty 4.0

No single post-Moore technology replaces current HPC for plasma simulations, but FPGA-class accelerators offer near-term kernel offload, non-von Neumann architectures medium-term operator acceleration, and quantum computing long-term potential for warm dense matter microphysics.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Exceeding the Numerical and Performance Characteristics of IEEE-754 SGEMM with BFloat16 Tensor Cores on GPUs for Scientific Computing cs.DC · 2026-05-15 · conditional · none · ref 22
BF16 tensor cores on GPUs emulate FP32 SGEMM with superior performance, power efficiency, and numerical accuracy compared to native FP32, including a library implementation that handles denormals.

Guaranteed dgemm accuracy while using reduced precision tensor cores through extensions of the ozaki scheme

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer