Numerical behavior of NVIDIA tensor cores

[Online] · 2021 · DOI 10.7717/peerj-cs.330

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Bit-Accurate Modeling of GPU Matrix Multiply-Accumulate Units: Demystifying Numerical Discrepancy and Accuracy

cs.AR · 2025-11-14 · accept · novelty 8.0

The authors derive the first bit-accurate arithmetic models for matrix multiply-accumulate operations on ten GPU architectures spanning NVIDIA Volta to Blackwell and AMD CDNA1 to CDNA3.

Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

cs.DC · 2026-05-05 · unverdicted · novelty 5.0

Microbenchmark-driven analytical models for B200 and MI300A achieve 1.31% and 0.09% MAE on validation kernels, far outperforming roofline baselines exceeding 95% error.

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

math.NA · 2025-06-12 · unverdicted · novelty 5.0

Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.

Optimizing Semiconductor Device Simulations through Low-Precision Arithmetic

cs.CE · 2026-06-24 · unverdicted · novelty 4.0

The quatrex quantum transport solver achieves up to 51% higher throughput using low-precision formats while maintaining accuracy on realistic semiconductor systems.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures cs.DC · 2026-05-05 · unverdicted · none · ref 26
Microbenchmark-driven analytical models for B200 and MI300A achieve 1.31% and 0.09% MAE on validation kernels, far outperforming roofline baselines exceeding 95% error.
Optimizing Semiconductor Device Simulations through Low-Precision Arithmetic cs.CE · 2026-06-24 · unverdicted · none · ref 24
The quatrex quantum transport solver achieves up to 51% higher throughput using low-precision formats while maintaining accuracy on realistic semiconductor systems.

Numerical behavior of NVIDIA tensor cores

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer