Introduces Distributed Level-Blocked MPK combining RACE cache blocking with MPI, reporting substantial speedups up to 4x on 832 cores for matrix power kernels across scientific sparse matrices.
Minimizing communication in sparse matrix solvers,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Integrating RACE into Trilinos applies algebraic temporal blocking to MPK in s-step GMRES, polynomial preconditioners, and AMG, yielding up to 3x speedups on multi-core CPUs for MPK-dominated algorithms.
citing papers explorer
-
Cache Blocking of Distributed-Memory Parallel Matrix Power Kernels
Introduces Distributed Level-Blocked MPK combining RACE cache blocking with MPI, reporting substantial speedups up to 4x on 832 cores for matrix power kernels across scientific sparse matrices.
-
Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs
Integrating RACE into Trilinos applies algebraic temporal blocking to MPK in s-step GMRES, polynomial preconditioners, and AMG, yielding up to 3x speedups on multi-core CPUs for MPK-dominated algorithms.