Code-switched Language Models Using Dual RNNs and Same-Source Pretraining

Preethi Jyothi; Saurabh Garg; Tanmay Parekh

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1809.01962 v1 pith:CHIZYBFI submitted 2018-09-06 cs.CL cs.LG

Code-switched Language Models Using Dual RNNs and Same-Source Pretraining

Saurabh Garg , Tanmay Parekh , Preethi Jyothi This is my paper

classification cs.CL cs.LG

keywords code-switchedlanguagetextdualmodelspretrainingtechniquesbuilding

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

This work focuses on building language models (LMs) for code-switched text. We propose two techniques that significantly improve these LMs: 1) A novel recurrent neural network unit with dual components that focus on each language in the code-switched text separately 2) Pretraining the LM using synthetic text from a generative model estimated using the training data. We demonstrate the effectiveness of our proposed techniques by reporting perplexities on a Mandarin-English task and derive significant reductions in perplexity.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Deep Generative Model for Code-Switched Text
cs.CL 2019-06 unverdicted novelty 6.0

VACS is a two-level hierarchical VAE that generates diverse code-switched sentences, and augmenting monolingual data with its output reduces language model perplexity by 33.06%.