MultiClin benchmark shows multiscript-aware evaluation is fairer than single-reference metrics for clinical ASR, and script unification during training yields the best performance.
When Multiple Scripts Matter: Evaluating ASR in Clinical Settings
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Automatic speech recognition (ASR) in non-English clinical settings is challenged by multiscript variability, where the same term may appear in multiple valid orthographic forms. Conventional string-matching evaluation metrics often underestimate ASR performance by treating orthographic variants as errors. To address this issue, we introduce MultiClin, a clinical ASR benchmark designed to evaluate robustness to multiscript variability. Experiments across diverse ASR models show that multiscript-aware evaluation provides a fairer assessment of recognition quality than conventional single-reference evaluation. We further investigate the impact of script consistency during training and find that inconsistent script mappings increase orthographic uncertainty and hinder model convergence, with a balanced 50% mapping ratio producing the highest entropy. In contrast, script unification consistently yields the best ASR performance. Our dataset and code are publicly available at: https://github.com/aitrics-ronaldo/Interspeech_MultiClin.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Multiple Scripts Matter: Evaluating ASR in Clinical Settings
MultiClin benchmark shows multiscript-aware evaluation is fairer than single-reference metrics for clinical ASR, and script unification during training yields the best performance.