MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.
Let it flow: Resilient asym- metric load balancing with flowlet switching
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Resilient AI Supercomputer Networking using MRC and SRv6
MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.