GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
ImageNet classification with deep convolutional neural networks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
End-to-end 3D CNN with separable convolutions for efficient simultaneous spatial-temporal video object and action segmentation.
citing papers explorer
-
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GShard supplies automatic sharding and conditional computation support that enabled training a 600-billion-parameter multilingual translation model on thousands of TPUs with superior quality.
-
An Efficient 3D CNN for Action/Object Segmentation in Video
End-to-end 3D CNN with separable convolutions for efficient simultaneous spatial-temporal video object and action segmentation.