pith. sign in

hub

Deep Speech: Scaling up end-to-end speech recognition

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it
abstract

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

hub tools

citation-role summary

background 2 dataset 1

citation-polarity summary

representative citing papers

Gauge-covariant stochastic neural fields: Stability and finite-width effects

hep-th · 2025-08-26 · unverdicted · novelty 7.0

A gauge-covariant stochastic neural field theory is introduced that derives the maximal Lyapunov exponent and amplification factor, showing finite-width effects as perturbative corrections to dressed kernels that leave the marginality condition unchanged for fixed kernel geometry.

Deep Learning Scaling is Predictable, Empirically

cs.LG · 2017-12-01 · unverdicted · novelty 7.0

Deep learning generalization error follows power-law scaling with training set size across multiple domains, with model size scaling sublinearly with data size.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Sink or SWIM: Tackling Real-Time ASR at Scale

cs.SD · 2026-01-22 · unverdicted · novelty 6.0

SWIM scales Whisper ASR to 20 concurrent multilingual clients via buffer merging, achieving ~2.4s delay at 5 clients versus 3.4s for single-client baselines while preserving accuracy.

citing papers explorer

Showing 12 of 12 citing papers.