Sparse gpu kernels for deep learning,

· 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication

cs.DC · 2026-04-08 · unverdicted · novelty 6.0

Sparsity-aware roofline models are required for accurate SpMM performance prediction because matrix structure alters arithmetic intensity and a single unified model fails across patterns like block, banded, scale-free, and random.

SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication

cs.DC · 2025-12-23 · unverdicted · novelty 6.0

SHIRO achieves geometric mean speedups of 221.5x to 8.8x over four baselines in distributed SpMM on up to 128 GPUs by exploiting sparsity patterns and two-tier network topologies.

citing papers explorer

Showing 2 of 2 citing papers.

Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication cs.DC · 2026-04-08 · unverdicted · none · ref 10
Sparsity-aware roofline models are required for accurate SpMM performance prediction because matrix structure alters arithmetic intensity and a single unified model fails across patterns like block, banded, scale-free, and random.
SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication cs.DC · 2025-12-23 · unverdicted · none · ref 12
SHIRO achieves geometric mean speedups of 221.5x to 8.8x over four baselines in distributed SpMM on up to 128 GPUs by exploiting sparsity patterns and two-tier network topologies.

Sparse gpu kernels for deep learning,

fields

years

verdicts

representative citing papers

citing papers explorer