PSP decomposes TTS accent into retroflex collapse rate, aspiration fidelity, vowel-length fidelity, Tamil-zha fidelity, FAD, and prosodic signature divergence, revealing that commercial systems vary in accent fidelity beyond WER scores.
Towards building text-to-speech systems for the next billion users
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A combination of phoneme romanization, targeted LoRA adaptation, and voice-prompt recovery enables commercial-class Indic TTS from a non-Indic base without acoustic retraining or commercial data.
citing papers explorer
-
PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech
PSP decomposes TTS accent into retroflex collapse rate, aspiration fidelity, vowel-length fidelity, Tamil-zha fidelity, FAD, and prosodic signature divergence, revealing that commercial systems vary in accent fidelity beyond WER scores.
-
Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost
A combination of phoneme romanization, targeted LoRA adaptation, and voice-prompt recovery enables commercial-class Indic TTS from a non-Indic base without acoustic retraining or commercial data.