Evaluation of open-source and commercial ASR models on narrow-band Hindi and Indian English shows poor zero-shot results and inconsistent fine-tuning benefits tied to pretraining exposure.
Responsible ASR: Overcoming Challenges of Foundational Models in Narrow-Band and Low-Resource Settings
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Telephony conversations worldwide are conducted over narrow-band channels and are often spontaneous and colloquial in nature. This paper evaluates the performance of widely used foundational automatic speech recognition (ASR) models -- both open-source and commercial -- on narrow-band conversations in Hindi, a low-resource language, and Indian-accented English, a low-resource accent. We first assess these models in a zero-shot setting and find that their performance remains suboptimal across the board. Highlighting the challenges faced by ASR models in narrow-band and low-resource language scenarios, we further investigate the impact of fine-tuning open-source models using a limited set of real-life annotated recordings. Our findings indicate that while fine-tuning provides some improvements, its effectiveness varies across languages and accents, largely influenced by the amount of data encountered during pretraining
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Responsible ASR: Overcoming Challenges of Foundational Models in Narrow-Band and Low-Resource Settings
Evaluation of open-source and commercial ASR models on narrow-band Hindi and Indian English shows poor zero-shot results and inconsistent fine-tuning benefits tied to pretraining exposure.