MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.
Surviv- ing switch failures in cloud datacenters.SIGCOMM Comput
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Resilient AI Supercomputer Networking using MRC and SRv6
MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.