Massively Multilingual Adversarial Speech Recognition

David Yarowsky; Matthew Wiesner; Oliver Adams; Shinji Watanabe

Massively Multilingual Adversarial Speech Recognition

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1904.02210 v1 pith:RS7PFKZR submitted 2019-04-03 cs.CL cs.LG

Massively Multilingual Adversarial Speech Recognition

Oliver Adams , Matthew Wiesner , Shinji Watanabe , David Yarowsky This is my paper

classification cs.CL cs.LG

keywords languagesmultilingualobjectivepretrainingrecognitionspeechadaptationadditional

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings
cs.CL 2026-05 unverdicted novelty 4.0

SPARCLE builds speaker-aware grapheme representations by contrastively aligning characters with Wav2Vec2 acoustic embeddings conditioned on speaker identity, replacing G2P for TTS and halving WER in low-resource cases.