MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.
Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Resilient AI Supercomputer Networking using MRC and SRv6
MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.