CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to
Super tiny language models.arXiv preprint arXiv:2405.14159
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Stochastic tokenisation during pre-training and fine-tuning improves LLM robustness to perturbations while preserving accuracy.
citing papers explorer
-
All is Not Lost: LLM Recovery without Checkpoints
CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to
-
Stochasticity in Tokenisation Improves Robustness
Stochastic tokenisation during pre-training and fine-tuning improves LLM robustness to perturbations while preserving accuracy.