FreeScale reduces computational bubbles by up to 90.3% in distributed training of sequence recommendation models on 256 H100 GPUs via load balancing, prioritized embedding overlap, and SM-Free communication.
Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations, 2024a
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost
FreeScale reduces computational bubbles by up to 90.3% in distributed training of sequence recommendation models on 256 H100 GPUs via load balancing, prioritized embedding overlap, and SM-Free communication.