First distributed performance-portable NUFFT scales to 1024 GPUs on heterogeneous systems and supports large particle-in-Fourier plasma simulations.
In2021 IEEE Int
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
GNN-DRL cloud schedulers for DAG workflows degrade under topology shifts because structural mismatches disrupt message passing and policy generalization.
A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.
citing papers explorer
-
A Performance-Portable, Massively Parallel Distributed Nonuniform FFT
First distributed performance-portable NUFFT scales to 1024 GPUs on heterogeneous systems and supports large particle-in-Fourier plasma simulations.
-
PackSELL: A Sparse Matrix Format for Precision-Agnostic High-Performance SpMV
PackSELL packs delta-encoded indices and values into single words with tunable bit allocation, delivering up to 1.63x faster FP16 SpMV and FP32-accurate performance exceeding FP16 cuSPARSE while reducing memory traffic.
-
On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach
GNN-DRL cloud schedulers for DAG workflows degrade under topology shifts because structural mismatches disrupt message passing and policy generalization.
-
The Landscape of GPU-Centric Communication
A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.