pith. sign in

Script collapse in multilingual ASR: A reference-free metric and 100-pair benchmark

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Word error rate (WER) is the dominant metric for automatic speech recognition, yet it cannot detect a systematic failure mode: models that produce fluent output in the wrong writing system. We define Script Fidelity Rate (SFR), the fraction of hypothesis characters in the target script block, computable without reference transcriptions, and report a systematic measurement of script collapse across ten languages spanning six writing systems and ten models (seven Whisper sizes, MMS-1B, SeamlessM4T-v2, and Gemma 4 E2B) on FLEURS test sets. Across 100 evaluated model-language pairs, 21 (21%; 95% Wilson CI: 14-30%) exhibit script collapse (SFR less than 10%): 20 involve Whisper and one involves Gemma 4 E2B on Urdu under a generic transcription prompt. In a ten-language Gemma 4 probe, script-aware prompting raises mean SFR from 71.2% to 97.7%, fixes Urdu collapse (6.5% to 97.0%), and recovers 5.9 chrF on downstream NLLB translation for the six languages whose baseline SFR is below 90%. We identify four collapse patterns: Latin phonetic substitution, Arabic substitution for Somali, Devanagari substitution for Bengali/Malayalam, and unique-script Latin collapse for Georgian.

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.