A cell-wise primary storage for continuous high-order FEM enables exact matrix-free CG iteration on GPUs by confining communication to the preconditioner and realizing DSS via sequential axis face exchanges without gather-scatter or atomics.
arXiv preprint arXiv:2508.00441 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Fused Tensor Core kernels for Ozaki Schemes I and II achieve up to 83% of INT8 peak throughput and outperform cuBLAS TF32 and ZGEMM on large matrices at comparable accuracy.
citing papers explorer
-
Coalesced Matrix-Free Finite Elements in Cell-Wise Storage
A cell-wise primary storage for continuous high-order FEM enables exact matrix-free CG iteration on GPUs by confining communication to the preconditioner and realizing DSS via sequential axis face exchanges without gather-scatter or atomics.