PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 2polarities
background 2representative citing papers
A fused gather-GEMM-scatter CUDA kernel achieves 4.6-7.3x end-to-end speedup and 3.2-4.9x lower energy for matrix-free 3D SIMP topology optimization on RTX 4090 compared to three-stage baselines.
Neural operators supply warm-start guesses that cut iteration counts and runtime by up to 90% in Krylov solvers for PDEs while retaining the original methods' convergence guarantees.
Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.
citing papers explorer
-
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
-
Matrix-Free 3D SIMP Topology Optimization with Fused Gather-GEMM-Scatter Kernels
A fused gather-GEMM-scatter CUDA kernel achieves 4.6-7.3x end-to-end speedup and 3.2-4.9x lower energy for matrix-free 3D SIMP topology optimization on RTX 4090 compared to three-stage baselines.
-
NOWS: Neural Operator Warm Starts for Accelerating Iterative Solvers
Neural operators supply warm-start guesses that cut iteration counts and runtime by up to 90% in Krylov solvers for PDEs while retaining the original methods' convergence guarantees.
-
Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic
Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.