Runtime compression of MPI messages to improve the performance and scalability of parallel applications

· 2010 · DOI 10.1109/sc

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding

cs.DC · 2026-05-12 · unverdicted · novelty 7.0

NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.

FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies

physics.comp-ph · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

FusionRCG uses liveness-aware graph orchestration, Cartesian-to-spherical fusion, and multi-tier kernels to cut intermediate data by up to 7.7x and deliver 3.09x SCF speedup on A100 GPUs.

RegDem: Increasing GPU Performance via Shared Memory Register Spilling

cs.PF · 2019-07-05 · unverdicted · novelty 6.0

RegDem translates SASS code to spill registers to shared memory, increasing occupancy and delivering 9% geometric mean speedup over nvcc with peaks of 18%.

Extending UNIQuE: Quantum Simulation Speedup for the HHL Algorithm

quant-ph · 2026-04-28 · unverdicted · novelty 4.0

Classical emulation of the HHL algorithm via extended UNIQuE scales exponentially only with qubit count and shows runtime advantage over state-vector simulation for small linear systems.

The Landscape of GPU-Centric Communication

cs.DC · 2024-09-15 · unverdicted · novelty 2.0

A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.

citing papers explorer

Showing 5 of 5 citing papers.

NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding cs.DC · 2026-05-12 · unverdicted · none · ref 35
NCCLZ decouples quantization and entropy coding across NCCL stack layers to enable overlapped compression, delivering up to 9.65x speedup over plain NCCL on scientific and training workloads.
FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies physics.comp-ph · 2026-05-11 · unverdicted · none · ref 33 · 2 links
FusionRCG uses liveness-aware graph orchestration, Cartesian-to-spherical fusion, and multi-tier kernels to cut intermediate data by up to 7.7x and deliver 3.09x SCF speedup on A100 GPUs.
RegDem: Increasing GPU Performance via Shared Memory Register Spilling cs.PF · 2019-07-05 · unverdicted · none · ref 19
RegDem translates SASS code to spill registers to shared memory, increasing occupancy and delivering 9% geometric mean speedup over nvcc with peaks of 18%.
Extending UNIQuE: Quantum Simulation Speedup for the HHL Algorithm quant-ph · 2026-04-28 · unverdicted · none · ref 19
Classical emulation of the HHL algorithm via extended UNIQuE scales exponentially only with qubit count and shows runtime advantage over state-vector simulation for small linear systems.
The Landscape of GPU-Centric Communication cs.DC · 2024-09-15 · unverdicted · none · ref 36
A survey categorizing vendor mechanisms and user-level libraries for GPU-centric communication within and across nodes, with discussion of benefits, challenges, and open questions.

Runtime compression of MPI messages to improve the performance and scalability of parallel applications

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer