Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

Andrea Madotto; Chien-Sheng Wu; Genta Indra Winata; Pascale Fung

arxiv: 1805.12070 · v2 · pith:S54226J5new · submitted 2018-05-30 · 💻 cs.CL

Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning

Genta Indra Winata , Andrea Madotto , Chien-Sheng Wu , Pascale Fung This is my paper

classification 💻 cs.CL

keywords languagemodelcode-switchingmodelingdataissuelearningmulti-task

0 comments

read the original abstract

Lack of text data has been the major issue on code-switching language modeling. In this paper, we introduce multi-task learning based language model which shares syntax representation of languages to leverage linguistic information and tackle the low resource data issue. Our model jointly learns both language modeling and Part-of-Speech tagging on code-switched utterances. In this way, the model is able to identify the location of code-switching points and improves the prediction of next word. Our approach outperforms standard LSTM based language model, with an improvement of 9.7% and 7.4% in perplexity on SEAME Phase I and Phase II dataset respectively.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Deep Generative Model for Code-Switched Text
cs.CL 2019-06 unverdicted novelty 6.0

VACS is a two-level hierarchical VAE that generates diverse code-switched sentences, and augmenting monolingual data with its output reduces language model perplexity by 33.06%.