pith. sign in
Pith Number

pith:D7PUEU5N

pith:2026:D7PUEU5NE7WMGW3YHMST5WJF4K
not attested not anchored not stored refs pending

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Aaditya Pareek, Amritansh Walecha, Bhaskar Singh, Hanuman Sidh, Kaushal Bhogale, Mahima Manik, Manas Dhir, Manmeet Kaur, Mitesh M. Khapra, Sagar Jain, Shobhit Banga, Tahir Javed, Utkarsh Singh, Vanshika Chhabra

A benchmark of unscripted phone conversations reveals gaps in current speech recognition for Indian languages.

arxiv:2604.19151 v2 · 2026-04-21 · cs.CL · cs.SD · eess.AS

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{D7PUEU5NE7WMGW3YHMST5WJF4K}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce Voice of India, a closed source benchmark built from unscripted telephonic conversations covering 15 major Indian languages across 139 regional clusters. The dataset contains 306230 utterances, totaling 536 hours of speech from 36691 speakers with transcripts accounting for spelling variations.

C2weakest assumption

That the unscripted telephonic conversations and manually created transcripts with spelling variants provide a meaningfully superior and unbiased representation of real-world Indic speech compared to existing scripted benchmarks.

C3one line summary

Voice of India is a new 536-hour benchmark of real telephonic conversations in 15 Indian languages with variant-aware transcripts for more realistic ASR evaluation.

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-26T02:03:13.254624Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1fdf4253ad27ecc35b783b253ed925e2a3d939be6e48bf77e738fce774c8186c

Aliases

arxiv: 2604.19151 · arxiv_version: 2604.19151v2 · doi: 10.48550/arxiv.2604.19151 · pith_short_12: D7PUEU5NE7WM · pith_short_16: D7PUEU5NE7WMGW3Y · pith_short_8: D7PUEU5N
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/D7PUEU5NE7WMGW3YHMST5WJF4K \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1fdf4253ad27ecc35b783b253ed925e2a3d939be6e48bf77e738fce774c8186c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9993f2ea66cf5e71b41fb55b1966149441257c584f411c41a18d4db9b2811203",
    "cross_cats_sorted": [
      "cs.SD",
      "eess.AS"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-04-21T07:02:01Z",
    "title_canon_sha256": "ab9dae9f03e42c5931c9d05c16c4c71aaee4d452112e23beb4079d62dbe6ca3f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.19151",
    "kind": "arxiv",
    "version": 2
  }
}