UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

· 2026 · cs.CL · arXiv 2606.11681

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We propose UR-BERT, a Romanized transcription-based text-to-speech (TTS) encoder for massively multilingual TTS systems. Conventional grapheme-to-phoneme (G2P)-based approaches are limited to around 100 languages due to the availability of reliable G2P resources. In contrast, UR-BERT scales to 495 languages by unifying diverse writing systems into a shared Romanization representation. To further enhance phonetic fidelity and text-speech alignment, we introduce a speech token prediction objective during training, which encourages the encoder to learn speech-aware phonetic representations in a data-efficient manner. Experiments show that TTS systems built on UR-BERT consistently outperform recent text encoder baselines across a wide range of languages and resource conditions, and demonstrate strong generalization to unseen languages.

representative citing papers

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

UR-BERT scales multilingual TTS encoders to 495 languages via Romanization unification and speech token prediction, outperforming baselines with better generalization.

citing papers explorer

Showing 1 of 1 citing paper.

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction cs.CL · 2026-06-10 · unverdicted · none · ref 1 · internal anchor
UR-BERT scales multilingual TTS encoders to 495 languages via Romanization unification and speech token prediction, outperforming baselines with better generalization.

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

fields

years

verdicts

representative citing papers

citing papers explorer