Controllable Accent Normalization via Discrete Diffusion
read the original abstract
Existing accent normalization methods do not typically offer control over accent strength, yet many applications-such as language learning and dubbing-require tunable accent retention. We propose DLM-AN, a controllable accent normalization system built on masked discrete diffusion over self-supervised speech tokens. A Common Token Predictor identifies source tokens that likely encode native pronunciation; these tokens are selectively reused to initialize the reverse diffusion process. This provides a simple yet effective mechanism for controlling accent strength: reusing more tokens preserves more of the original accent. DLM-AN further incorporates a flow-matching Duration Ratio Predictor that automatically adjusts the total duration to better match the native rhythm. Experiments on multi-accent English data show that DLM-AN achieves the lowest word error rate among all compared systems while delivering competitive accent reduction and smooth, interpretable accent strength control.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Accent Conversion: A Problem-Driven Survey of Sociolinguistic and Technical Constraints
The survey reviews the evolution of accent conversion from early DSP approaches to neural models, situating them in linguistic foundations and highlighting constraints, datasets, evaluations, and future directions.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.