Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Alan W Black; Elizabeth Salesky; Matthias Sperber

arxiv: 1906.01199 · v1 · pith:AN7HAAIKnew · submitted 2019-06-04 · 💻 cs.CL · cs.SD· eess.AS

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Elizabeth Salesky , Matthias Sperber , Alan W Black This is my paper

classification 💻 cs.CL cs.SDeess.AS

keywords speechtranslationrepresentationscreateend-to-endfeaturesframe-levelframes

0 comments

read the original abstract

Previous work on end-to-end translation from speech has primarily used frame-level features as speech representations, which creates longer, sparser sequences than text. We show that a naive method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features. Specifically, we generate phoneme labels for speech frames and average consecutive frames with the same label to create shorter, higher-level source sequences for translation. We see improvements of up to 5 BLEU on both our high and low resource language pairs, with a reduction in training time of 60%. Our improvements hold across multiple data sizes and two language pairs.

This paper has not been read by Pith yet.

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

discussion (0)