PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
Sparse matrix-vector multiplication on GPGPUs
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
KerneLDI accelerates exchange-correlation integration in Kohn-Sham DFT by up to 10x through block-structured matrix multiplication that exploits spatial locality on GPUs while preserving accuracy.
CB-SpMV is a cache-friendly 2D block-based SpMV algorithm for GPUs that uses data aggregation via virtual pointers, sub-block format selection, and inter-block load balancing to deliver up to 3.95x average speedup over cuSPARSE-BSR, TileSpMV, and DASP on 2843 SuiteSparse matrices.
Matching-based AMG preconditioners deliver robust and scalable performance for solving large ill-conditioned systems from IgA discretizations in parallel HPC settings.
citing papers explorer
-
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
-
Accelerating Locality-Driven Integration in Quantum Chemistry with Block-Structured Matrix Multiplication
KerneLDI accelerates exchange-correlation integration in Kohn-Sham DFT by up to 10x through block-structured matrix multiplication that exploits spatial locality on GPUs while preserving accuracy.
-
CB-SpMV:A Data Aggregating and Balance Algorithm for Cache-Friendly Block-Based SpMV on GPUs
CB-SpMV is a cache-friendly 2D block-based SpMV algorithm for GPUs that uses data aggregation via virtual pointers, sub-block format selection, and inter-block load balancing to deliver up to 3.95x average speedup over cuSPARSE-BSR, TileSpMV, and DASP on 2843 SuiteSparse matrices.
-
Parallel matching-based AMG preconditioners for elliptic equations discretized by IgA
Matching-based AMG preconditioners deliver robust and scalable performance for solving large ill-conditioned systems from IgA discretizations in parallel HPC settings.