pith. sign in

arxiv: 2105.03010 · v1 · pith:G7DZY5SUnew · submitted 2021-05-07 · 💻 cs.CL · cs.SD· eess.AS

Efficient Weight factorization for Multilingual Speech Recognition

classification 💻 cs.CL cs.SDeess.AS
keywords languagelanguagesmultilingualspeechweightcomponentdifferentefficient
0
0 comments X
read the original abstract

End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In this paper we propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions. The key idea of the method is to assign fast weight matrices for each language by decomposing each weight matrix into a shared component and a language dependent component. The latter is then factorized into vectors using rank-1 assumptions to reduce the number of parameters per language. This efficient factorization scheme is proved to be effective in two multilingual settings with $7$ and $27$ languages, reducing the word error rates by $26\%$ and $27\%$ rel. for two popular architectures LSTM and Transformer, respectively.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Adding Robust Code-Switching Capabilities to High Performance Multilingual ASR

    cs.CL 2026-06 unverdicted novelty 5.0

    Proposes Bayesian factorized adaptation for multilingual ASR to handle code-switching, reporting 32.87% fewer errors on switched words and 5.31% better overall WER while preserving monolingual accuracy with small synt...

  2. Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

    cs.CL 2026-06 unverdicted novelty 4.0

    KIT's IWSLT submission uses segment concatenation, LLM label generation and cross-lingual translation to create >1M long-form training instances and shows that likelihood re-ranking harms semantic tasks unless combine...