pith. sign in

Designing a Low-Latency Megakernel for Llama- 1B

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.DC 1

years

2026 1

verdicts

CONDITIONAL 1

representative citing papers

Eliminating Hidden Serialization in Multi-Node Megakernel Communication

cs.DC · 2026-05-01 · conditional · novelty 6.0

Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.

citing papers explorer

Showing 1 of 1 citing paper.

  • Eliminating Hidden Serialization in Multi-Node Megakernel Communication cs.DC · 2026-05-01 · conditional · none · ref 48

    Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.