pith. sign in

arxiv: 2603.14275 · v2 · pith:P3AEW5EDnew · submitted 2026-03-15 · 📡 eess.AS · cs.AI· cs.SD

Controllable Accent Normalization via Discrete Diffusion

classification 📡 eess.AS cs.AIcs.SD
keywords accenttokensdiffusiondlm-annormalizationstrengthcontrolcontrollable
0
0 comments X
read the original abstract

Existing accent normalization methods do not typically offer control over accent strength, yet many applications-such as language learning and dubbing-require tunable accent retention. We propose DLM-AN, a controllable accent normalization system built on masked discrete diffusion over self-supervised speech tokens. A Common Token Predictor identifies source tokens that likely encode native pronunciation; these tokens are selectively reused to initialize the reverse diffusion process. This provides a simple yet effective mechanism for controlling accent strength: reusing more tokens preserves more of the original accent. DLM-AN further incorporates a flow-matching Duration Ratio Predictor that automatically adjusts the total duration to better match the native rhythm. Experiments on multi-accent English data show that DLM-AN achieves the lowest word error rate among all compared systems while delivering competitive accent reduction and smooth, interpretable accent strength control.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Accent Conversion: A Problem-Driven Survey of Sociolinguistic and Technical Constraints

    cs.SD 2026-04 unverdicted novelty 2.0

    The survey reviews the evolution of accent conversion from early DSP approaches to neural models, situating them in linguistic foundations and highlighting constraints, datasets, evaluations, and future directions.