Introduces Distributed Level-Blocked MPK combining RACE cache blocking with MPI, reporting substantial speedups up to 4x on 832 cores for matrix power kernels across scientific sparse matrices.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
A Hermite-like basis minimizes neighbor data access in matrix-free DG SIP operators to one value and one derivative per neighbor on hexahedral elements via Jacobi roots and tensor products.
citing papers explorer
-
Cache Blocking of Distributed-Memory Parallel Matrix Power Kernels
Introduces Distributed Level-Blocked MPK combining RACE cache blocking with MPI, reporting substantial speedups up to 4x on 832 cores for matrix power kernels across scientific sparse matrices.
-
A Hermite-like basis for faster matrix-free evaluation of interior penalty discontinuous Galerkin operators
A Hermite-like basis minimizes neighbor data access in matrix-free DG SIP operators to one value and one derivative per neighbor on hexahedral elements via Jacobi roots and tensor products.