Sparsity-aware roofline models are required for accurate SpMM performance prediction because matrix structure alters arithmetic intensity and a single unified model fails across patterns like block, banded, scale-free, and random.
Sparse gpu kernels for deep learning,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2verdicts
UNVERDICTED 2representative citing papers
SHIRO achieves geometric mean speedups of 221.5x to 8.8x over four baselines in distributed SpMM on up to 128 GPUs by exploiting sparsity patterns and two-tier network topologies.
citing papers explorer
-
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
Sparsity-aware roofline models are required for accurate SpMM performance prediction because matrix structure alters arithmetic intensity and a single unified model fails across patterns like block, banded, scale-free, and random.
-
SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication
SHIRO achieves geometric mean speedups of 221.5x to 8.8x over four baselines in distributed SpMM on up to 128 GPUs by exploiting sparsity patterns and two-tier network topologies.