CSA-UD is a communication-semantic-aware unreliable datagram RDMA loss recovery mechanism that improves QP scalability and reduces 99th percentile flow completion times in hyperscale AI training collectives.
Insights into deepseek-v3: Scaling challenges and reflections on hardware for ai architectures,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Communication-Semantic-Aware RDMA Loss Recovery for QP-scalable Hyperscale AI Training
CSA-UD is a communication-semantic-aware unreliable datagram RDMA loss recovery mechanism that improves QP scalability and reduces 99th percentile flow completion times in hyperscale AI training collectives.