Vaani Benchmark V1.0: An Inclusive Multimodal Benchmark Dataset for Hindi

Agneedh Basu; Nihar Desai; Pavan Kumar J; Pranav Bhat; Prasanta Kumar Ghosh; Saurabh Kumar; Sujith Pulikodan; Visruth Sanka

arxiv: 2606.21408 · v1 · pith:G2HIZZXLnew · submitted 2026-06-19 · 📡 eess.AS

Vaani Benchmark V1.0: An Inclusive Multimodal Benchmark Dataset for Hindi

Sujith Pulikodan , Agneedh Basu , Saurabh Kumar , Pranav Bhat , Pavan Kumar J , Visruth Sanka , Nihar Desai , Prasanta Kumar Ghosh This is my paper

classification 📡 eess.AS

keywords benchmarkdatasetevaluationhindiinclusiveacrossdemographicmultimodal

0 comments

read the original abstract

Benchmarking is critical for the systematic evaluation and comparison of automatic speech recognition (ASR) systems. While several open-source datasets are available for Hindi ASR, existing benchmarks remain limited in geographic diversity, demographic representation, and transcription robustness. We introduce an inclusive, multimodal Hindi ASR benchmark collected from 104 districts across India. The dataset consists of spontaneous speech elicited using image prompts and recorded in real-world acoustic conditions across diverse demographic groups. Each audio segment is annotated with three independent transcriptions, enabling multi-reference evaluation that accounts for permissible orthographic and lexical variations. This design supports more robust, inclusive, and realistic ASR evaluation. We benchmark multiple open-source and proprietary ASR models and report their comparative performance on the benchmark dataset.

This paper has not been read by Pith yet.

Vaani Benchmark V1.0: An Inclusive Multimodal Benchmark Dataset for Hindi

discussion (0)