pith. sign in

arxiv: 2110.14883 · v3 · pith:STSXVC2Ynew · submitted 2021-10-28 · 💻 cs.LG · cs.AI· cs.CL· cs.CV· cs.DC

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

classification 💻 cs.LG cs.AIcs.CLcs.CVcs.DC
keywords trainingparallelcolossal-aideeplearningsystemlarge-scalemethods
0
0 comments X
read the original abstract

The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism, as well as heterogeneous training methods integrated with zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Scaling Data-Constrained Language Models

    cs.CL 2023-05 conditional novelty 6.0

    Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.

  2. A Survey of Large Language Models

    cs.CL 2023-03 accept novelty 3.0

    This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

  3. A Comprehensive Overview of Large Language Models

    cs.CL 2023-07 unverdicted novelty 2.0

    A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.