Improving Speaker-Independent Lipreading with Domain-Adversarial Training

Juergen Schmidhuber; Michael Wand

arxiv: 1708.01565 · v1 · pith:DJFLFQERnew · submitted 2017-08-04 · 💻 cs.CV · cs.CL

Improving Speaker-Independent Lipreading with Domain-Adversarial Training

Michael Wand , Juergen Schmidhuber This is my paper

classification 💻 cs.CV cs.CL

keywords targettrainingaccuracydomain-adversarialonlysystemdatalipreading

0 comments

read the original abstract

We present a Lipreading system, i.e. a speech recognition system using only visual features, which uses domain-adversarial training for speaker independence. Domain-adversarial training is integrated into the optimization of a lipreader based on a stack of feedforward and LSTM (Long Short-Term Memory) recurrent neural networks, yielding an end-to-end trainable system which only requires a very small number of frames of untranscribed target data to substantially improve the recognition accuracy on the target speaker. On pairs of different source and target speakers, we achieve a relative accuracy improvement of around 40% with only 15 to 20 seconds of untranscribed target speech data. On multi-speaker training setups, the accuracy improvements are smaller but still substantial.

This paper has not been read by Pith yet.

Improving Speaker-Independent Lipreading with Domain-Adversarial Training

discussion (0)