A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.
Deep Speech 2: End-to-End Speech Recognition in English and Man- darin
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
verdicts
ACCEPT 2roles
background 2polarities
background 2representative citing papers
The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.
citing papers explorer
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
A noisy top-k gated mixture-of-experts layer between LSTMs scales neural networks to 137B parameters with sub-linear compute, beating SOTA on language modeling and machine translation.
-
Concrete Problems in AI Safety
The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.