pith. sign in

Exploring Pre-training Benefits on Phoneme Addition through Fine-tuning in Speech Synthesis

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Transfer learning is widely used for low-resource text-to-speech. When the target corpus contains phonemes unseen in pre-training, the model must expand its phoneme inventory during fine-tuning; we call the process "phoneme addition." However, it remains unclear whether the pre-trained ability to generate seen phonemes contributes to this process. This study investigates phoneme addition in two settings: (1) a simulation setup using LLM-generated phoneme-controlled corpora that enables investigation without considering confounding factors, and (2) a real-speech cross-lingual transfer setup (English to Japanese) to validate whether the findings hold in practice. Experiments in both settings showed that while fine-tuning achieved higher naturalness than training from scratch, it required as much or more data to achieve comparable PER for new phonemes. These results indicate that pre-training mainly contributes to naturalness improvement, but offers limited benefit for phoneme addition.

fields

cs.SD 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.