VAANI: Capturing the language landscape for an inclusive digital India

· 2026 · eess.AS · arXiv 2603.28714

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Voice based technologies have the potential to bridge digital accessibility gaps; however, existing datasets fail to capture the linguistic and regional diversity of Indic languages. We present Project VAANI, a large scale multimodal dataset designed to represent India's linguistic landscape across 165 districts. Speech data is collected using image based prompts to elicit spontaneous responses, while images are curated through a separate pipeline covering diverse themes across regions. The dataset undergoes a rigorous multi stage quality control process, combining automated and manual evaluation to ensure high audio quality and transcription accuracy. We release approximately 289K images, 31,255 hours of speech, and 2,043 hours of transcribed audio spanning 105 languages from 28 states and 3 union territories. Many of these languages are represented at this scale for the first time, making VAANI a foundational resource for inclusive speech technology. The dataset enables the development of robust, multilingual, and multimodal models, and supports research in speech recognition, language understanding, and cross-modal learning for underrepresented languages.

representative citing papers

A Comparative Study of Pre-trained Speech Encoders and Training Objectives for Large-Scale Indic Spoken Language Identification

eess.AS · 2026-06-08 · unverdicted · novelty 5.0

Frozen FastConformer with hierarchical softmax achieves over 90% macro accuracy on out-of-domain Indic LID benchmarks for 42 languages and outperforms Whisper and other objectives in cross-corpus settings.

Factors affecting ASR performance: A study using state of the art ASR models in Indic Languages

eess.AS · 2026-06-08 · unverdicted · novelty 4.0

Empirical analysis of speaker and acoustic factors correlated with ASR word error rates across five Indic languages using zero-shot evaluation on multiple open-source models.

A study on the impact of region specific data on the performance of Indic ASR

eess.AS · 2026-06-08 · unverdicted · novelty 3.0

Empirical study finds consistent positive correlation between inter-district geographic distance and ASR word error rate when models are finetuned on single-district Indic speech data.

citing papers explorer

Showing 3 of 3 citing papers.

A Comparative Study of Pre-trained Speech Encoders and Training Objectives for Large-Scale Indic Spoken Language Identification eess.AS · 2026-06-08 · unverdicted · none · ref 15 · internal anchor
Frozen FastConformer with hierarchical softmax achieves over 90% macro accuracy on out-of-domain Indic LID benchmarks for 42 languages and outperforms Whisper and other objectives in cross-corpus settings.
Factors affecting ASR performance: A study using state of the art ASR models in Indic Languages eess.AS · 2026-06-08 · unverdicted · none · ref 34 · internal anchor
Empirical analysis of speaker and acoustic factors correlated with ASR word error rates across five Indic languages using zero-shot evaluation on multiple open-source models.
A study on the impact of region specific data on the performance of Indic ASR eess.AS · 2026-06-08 · unverdicted · none · ref 14 · internal anchor
Empirical study finds consistent positive correlation between inter-district geographic distance and ASR word error rate when models are finetuned on single-district Indic speech data.

VAANI: Capturing the language landscape for an inclusive digital India

fields

years

verdicts

representative citing papers

citing papers explorer