FlashCP introduces Whole-Doc sharding, sharding-aware KV communication, and a heuristic for mixed sharding plans, claiming up to 1.63x speedup over prior CP methods for LLM training.
Analysing the impact of sequence composition on language model pre-training
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training
FlashCP introduces Whole-Doc sharding, sharding-aware KV communication, and a heuristic for mixed sharding plans, claiming up to 1.63x speedup over prior CP methods for LLM training.