GSPMD automatically infers tensor partitioning from limited user annotations to parallelize single-device ML programs across thousands of TPUs, reporting 50-62% utilization for up to trillion-parameter models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
citing papers explorer
-
GSPMD: General and Scalable Parallelization for ML Computation Graphs
GSPMD automatically infers tensor partitioning from limited user annotations to parallelize single-device ML programs across thousands of TPUs, reporting 50-62% utilization for up to trillion-parameter models.
-
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.