On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

Jan Kremer; Lars Maal{\o}e; Lasse Borgholt

arxiv: 1812.02308 · v1 · pith:6FMJS3RVnew · submitted 2018-11-28 · 💻 cs.CL · cs.LG· stat.ML

On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

Jan Kremer , Lasse Borgholt , Lars Maal{\o}e This is my paper

classification 💻 cs.CL cs.LGstat.ML

keywords modelcharacter-levelsupervisionword-levelwordsbiasend-to-endinductive

0 comments

read the original abstract

End-to-end automatic speech recognition (ASR) commonly transcribes audio signals into sequences of characters while its performance is evaluated by measuring the word-error rate (WER). This suggests that predicting sequences of words directly may be helpful instead. However, training with word-level supervision can be more difficult due to the sparsity of examples per label class. In this paper we analyze an end-to-end ASR model that combines a word-and-character representation in a multi-task learning (MTL) framework. We show that it improves on the WER and study how the word-level model can benefit from character-level supervision by analyzing the learned inductive preference bias of each model component empirically. We find that by adding character-level supervision, the MTL model interpolates between recognizing more frequent words (preferred by the word-level model) and shorter words (preferred by the character-level model).

This paper has not been read by Pith yet.

On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

discussion (0)