pith. sign in

arxiv: 1807.07104 · v5 · pith:NQJJ62NKnew · submitted 2018-07-18 · 💻 cs.CL

Hierarchical Multi Task Learning With CTC

classification 💻 cs.CL
keywords hierarchicalmodeltrainingdifferentintermediatemulti-taskrepresentationssame
0
0 comments X
read the original abstract

In Automatic Speech Recognition it is still challenging to learn useful intermediate representations when using high-level (or abstract) target units such as words. For that reason, character or phoneme based systems tend to outperform word-based systems when just few hundreds of hours of training data are being used. In this paper, we first show how hierarchical multi-task training can encourage the formation of useful intermediate representations. We achieve this by performing Connectionist Temporal Classification at different levels of the network with targets of different granularity. Our model thus performs predictions in multiple scales for the same input. On the standard 300h Switchboard training setup, our hierarchical multi-task architecture exhibits improvements over single-task architectures with the same number of parameters. Our model obtains 14.0% Word Error Rate on the Eval2000 Switchboard subset without any decoder or language model, outperforming the current state-of-the-art on acoustic-to-word models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cross-Attention End-to-End ASR for Two-Party Conversations

    eess.AS 2019-07 unverdicted novelty 6.0

    End-to-end ASR model with speaker-specific cross-attention for two-party conversations outperforms standard models on the Switchboard corpus.