ADPSGD and Hierarchical-ADPSGD support 3x larger batches than SSGD for ASR, training SWB-2000 to 7.6% WER on SWB and 13.2% on CH in 5.2 hours on 64 V100 GPUs.
Gadei: On scale-up training as a service for deep learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2019 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
ADPSGD and Hierarchical-ADPSGD support 3x larger batches than SSGD for ASR, training SWB-2000 to 7.6% WER on SWB and 13.2% on CH in 5.2 hours on 64 V100 GPUs.