Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.
Comet: Fine-grained computation-communication overlapping for mixture- of-experts.Proceedings of Machine Learning and Sys- tems (MLSys 25), 7
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Eliminating Hidden Serialization in Multi-Node Megakernel Communication
Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.