AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
Jigsaw: Accelerating spmm with vector sparsity on sparse tensor core
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.DC 2years
2026 2roles
background 1polarities
background 1representative citing papers
CoVer extended to Fortran preserves analysis accuracy, reveals a bug in MPI-BugBench, and runs substantially faster than MUST while supporting multiple languages.
citing papers explorer
-
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90% block sparsity.
-
Extending Contract Verification for Parallel Programming Models to Fortran
CoVer extended to Fortran preserves analysis accuracy, reveals a bug in MPI-BugBench, and runs substantially faster than MUST while supporting multiple languages.