End-to-End Speech Translation with Knowledge Distillation

Chengqing Zong; HaiFeng Wang; Hao Xiong; Hua Wu; Jiajun Zhang; Yuchen Liu; Zhongjun He

arxiv: 1904.08075 · v1 · pith:QKEEJ26Snew · submitted 2019-04-17 · 💻 cs.CL

End-to-End Speech Translation with Knowledge Distillation

Yuchen Liu , Hao Xiong , Zhongjun He , Jiajun Zhang , Hua Wu , HaiFeng Wang , Chengqing Zong This is my paper

classification 💻 cs.CL

keywords modelend-to-endtranslationknowledgespeechtextdistillationlanguage

0 comments

read the original abstract

End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years. Compared to conventional pipeline systems, end-to-end ST models have advantages of lower latency, smaller model size and less error propagation. However, the combination of speech recognition and text translation in one model is more difficult than each of these two tasks. In this paper, we propose a knowledge distillation approach to improve ST model by transferring the knowledge from text translation model. Specifically, we first train a text translation model, regarded as a teacher model, and then ST model is trained to learn output probabilities from teacher model through knowledge distillation. Experiments on English- French Augmented LibriSpeech and English-Chinese TED corpus show that end-to-end ST is possible to implement on both similar and dissimilar language pairs. In addition, with the instruction of teacher model, end-to-end ST model can gain significant improvements by over 3.5 BLEU points.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Robust Machine Translation with Domain Sensitive Pseudo-Sources: Baidu-OSU WMT19 MT Robustness Shared Task System Report
cs.CL 2019-06 unverdicted novelty 3.0

Baidu-OSU WMT19 system achieves >10 BLEU gain on En-Fr and Fr-En social media translation via domain sensitive training and pseudo noisy sources.