Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Boxiang Wang; Haichen Huang; Hongxin Liu; Jiarui Fang; Shenggui Li; Yang You; Yuliang Liu; Zhengda Bian

arxiv: 2110.14883 · v3 · pith:STSXVC2Ynew · submitted 2021-10-28 · 💻 cs.LG · cs.AI· cs.CL· cs.CV· cs.DC

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Shenggui Li , Hongxin Liu , Zhengda Bian , Jiarui Fang , Haichen Huang , Yuliang Liu , Boxiang Wang , Yang You This is my paper

classification 💻 cs.LG cs.AIcs.CLcs.CVcs.DC

keywords trainingparallelcolossal-aideeplearningsystemlarge-scalemethods

0 comments

read the original abstract

The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism, as well as heterogeneous training methods integrated with zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scaling Data-Constrained Language Models
cs.CL 2023-05 conditional novelty 6.0

Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.
A Survey of Large Language Models
cs.CL 2023-03 accept novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.
A Comprehensive Overview of Large Language Models
cs.CL 2023-07 unverdicted novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.