13 Msccl++: Rethinking gpu communication abstrac- tions for cutting-edge ai applications

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications · 2025 · arXiv 2504.09014

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

cs.PL · 2026-05-02 · unverdicted · novelty 6.0

DITRON introduces a hierarchical multi-level tiling compiler for distributed tensor programs that matches or exceeds expert CUDA libraries with 6-30% speedups and has been deployed to improve training MFU by over 10% while saving hundreds of thousands of GPU hours monthly.

UCCL-Zip: Lossless Compression Supercharged GPU Communication

cs.DC · 2026-04-19 · unverdicted · novelty 6.0

UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.

Janus: Disaggregating Attention and Experts for Scalable MoE Inference

cs.DC · 2025-12-15 · unverdicted · novelty 6.0

JANUS disaggregates attention and MoE layers onto separate GPU pools with an expert-balancing scheduler and SLO-aware scaling, delivering up to 4.7x higher per-GPU throughput than prior MoE systems under token-level latency constraints.

DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication

cs.DC · 2025-11-10 · unverdicted · novelty 6.0

DMA offloads on AMD MI300X GPUs are extended to latency-bound ML communication using untapped hardware features, closing up to 4.5x performance gap versus RCCL in collectives and delivering up to 1.5x lower latency and 1.9x higher throughput in LLM inference over vLLM.

UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training

cs.DC · 2026-04-21 · unverdicted · novelty 5.0

UniEP fuses MoE communication and computation into unified MegaKernels with deterministic token ordering, delivering 1.03x-1.38x speedups over prior work while preserving training accuracy.

citing papers explorer

Showing 5 of 5 citing papers.

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs cs.PL · 2026-05-02 · unverdicted · none · ref 25
DITRON introduces a hierarchical multi-level tiling compiler for distributed tensor programs that matches or exceeds expert CUDA libraries with 6-30% speedups and has been deployed to improve training MFU by over 10% while saving hundreds of thousands of GPU hours monthly.
UCCL-Zip: Lossless Compression Supercharged GPU Communication cs.DC · 2026-04-19 · unverdicted · none · ref 48
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
Janus: Disaggregating Attention and Experts for Scalable MoE Inference cs.DC · 2025-12-15 · unverdicted · none · ref 20
JANUS disaggregates attention and MoE layers onto separate GPU pools with an expert-balancing scheduler and SLO-aware scaling, delivering up to 4.7x higher per-GPU throughput than prior MoE systems under token-level latency constraints.
DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication cs.DC · 2025-11-10 · unverdicted · none · ref 28
DMA offloads on AMD MI300X GPUs are extended to latency-bound ML communication using untapped hardware features, closing up to 4.5x performance gap versus RCCL in collectives and delivering up to 1.5x lower latency and 1.9x higher throughput in LLM inference over vLLM.
UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training cs.DC · 2026-04-21 · unverdicted · none · ref 36
UniEP fuses MoE communication and computation into unified MegaKernels with deterministic token ordering, delivering 1.03x-1.38x speedups over prior work while preserving training accuracy.

13 Msccl++: Rethinking gpu communication abstrac- tions for cutting-edge ai applications

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer