Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.
Tallent, and Kevin J
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
GPIR achieves up to 297 times higher throughput than prior GPU PIR systems by fusing operations in stages and using pipelined transposed layouts to cut DRAM traffic during batched lattice-based queries.
A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.
citing papers explorer
-
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods
Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.
-
GPIR: Enabling Practical Private Information Retrieval with GPUs
GPIR achieves up to 297 times higher throughput than prior GPU PIR systems by fusing operations in stages and using pipelined transposed layouts to cut DRAM traffic during batched lattice-based queries.
-
The Landscape of GPU-Centric Communication
A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.