LiveR enables live reconfiguration for elastic LLM training by asynchronously preparing new parallel worlds and streaming reshaped model state over interconnects, reducing downtime to seconds and achieving 14-23x faster reconfiguration than checkpoint/restart.
Elaswave: An elastic-native system for scalable hybrid-parallel training, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LiveR: Fine-Grained Elasticity via Live Reconfiguration for Model Training
LiveR enables live reconfiguration for elastic LLM training by asynchronously preparing new parallel worlds and streaming reshaped model state over interconnects, reducing downtime to seconds and achieving 14-23x faster reconfiguration than checkpoint/restart.