pith. sign in

Canonical reference

MegaScale-MoE:Large-ScaleCommunication- Efficient Training of Mixture-of-Experts Models in Production

Canonical reference. 80% of citing Pith papers cite this work as background.

9 Pith papers citing it
Background 80% of classified citations

citation-role summary

background 4 method 1

citation-polarity summary

years

2026 8 2025 1

representative citing papers

Efficient Training on Multiple Consumer GPUs with RoundPipe

cs.DC · 2026-04-29 · conditional · novelty 8.0

RoundPipe achieves near-zero-bubble pipeline parallelism for LLM training on consumer GPUs by dynamically dispatching computation stages round-robin, yielding 1.48-2.16x speedups and enabling 235B model fine-tuning on 8x RTX 4090.

Eliminating Hidden Serialization in Multi-Node Megakernel Communication

cs.DC · 2026-05-01 · conditional · novelty 6.0

Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.

citing papers explorer

Showing 9 of 9 citing papers.