Phonological-level Wav2Vec2 CTC framework for Mandarin MDD reduces FAR by 10.1% and DER by 23.6% versus phoneme baseline by jointly modeling segmental and tonal attributes.
Using Phonological-Level Wav2Vec2 for Mandarin Automatic Mispronunciation Detection and Diagnosis
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Automatic mispronunciation detection and diagnosis (MDD) plays a crucial role in L2 Mandarin pronunciation learning. While end-to-end (E2E) based MDD methods have substantially improved phoneme-level detection accuracy, diagnostic feedback remains limited, as segmental and tonal errors are not explicitly separated. In this paper, we propose a phonological feature-based MDD framework that models both segmental and tonal attributes within a unified Wav2Vec2 CTC architecture. Experimental results show that the proposed method reduces the False Acceptance Rate (FAR) by 10.1% and the Diagnostic Error Rate (DER) by 23.6% compared with the phoneme-only baseline system. By decomposing phonemes into low-level phonological components, the proposed approach enables more detailed and interpretable diagnostic feedback for L2 learners.
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Using Phonological-Level Wav2Vec2 for Mandarin Automatic Mispronunciation Detection and Diagnosis
Phonological-level Wav2Vec2 CTC framework for Mandarin MDD reduces FAR by 10.1% and DER by 23.6% versus phoneme baseline by jointly modeling segmental and tonal attributes.