The authors derive the first bit-accurate arithmetic models for matrix multiply-accumulate operations on ten GPU architectures spanning NVIDIA Volta to Blackwell and AMD CDNA1 to CDNA3.
Numerical behavior of NVIDIA tensor cores
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Microbenchmark-driven analytical models for B200 and MI300A achieve 1.31% and 0.09% MAE on validation kernels, far outperforming roofline baselines exceeding 95% error.
Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.
The quatrex quantum transport solver achieves up to 51% higher throughput using low-precision formats while maintaining accuracy on realistic semiconductor systems.
citing papers explorer
-
Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
Microbenchmark-driven analytical models for B200 and MI300A achieve 1.31% and 0.09% MAE on validation kernels, far outperforming roofline baselines exceeding 95% error.
-
Optimizing Semiconductor Device Simulations through Low-Precision Arithmetic
The quatrex quantum transport solver achieves up to 51% higher throughput using low-precision formats while maintaining accuracy on realistic semiconductor systems.