pith. sign in

hub

Triton-distributed: Programming overlapping kernels on distributed ai systems with the triton compiler

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

fields

cs.DC 10 cs.AR 1

years

2026 9 2025 2

roles

background 1

polarities

background 1

representative citing papers

Eliminating Hidden Serialization in Multi-Node Megakernel Communication

cs.DC · 2026-05-01 · conditional · novelty 6.0

Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.

DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication

cs.DC · 2025-11-10 · unverdicted · novelty 6.0

DMA offloads on AMD MI300X GPUs are extended to latency-bound ML communication using untapped hardware features, closing up to 4.5x performance gap versus RCCL in collectives and delivering up to 1.5x lower latency and 1.9x higher throughput in LLM inference over vLLM.

citing papers explorer

Showing 11 of 11 citing papers.