hub

Omnilingual asr: Open-source multilingual speech recognition for 1600+ languages

Omnilingual ASR team, Gil Keren, Artyom Kozhevnikov, Yen Meng, Christophe Ropers, Matthew Setzler, Skyler Wang, Ife Adebara, Michael Auli, Can Balioglu, Kevin Chan, Chierh Cheng, Joe Chuang, Caley Droof, Mark Duppenthaler, Paul-Ambroise Duq · 2025 · arXiv 2511.09690

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Building Community-Centred NLP Resources for Puno Quechua

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

First dedicated ASR corpus of 66 hours and systematic benchmarks for Puno Quechua using participatory collection and open release of data and fine-tuned models.

UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations

eess.AS · 2026-05-18 · accept · novelty 7.0

UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.

Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR

eess.AS · 2026-04-10 · unverdicted · novelty 7.0

Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

cs.CL · 2026-04-06 · conditional · novelty 7.0

Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.

JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.

AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.

BlasBench: An Open Benchmark for Irish Speech Recognition

cs.CL · 2026-04-12 · conditional · novelty 6.0

BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.

SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors

cs.CL · 2026-05-30 · unverdicted · novelty 5.0

SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.

PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech

cs.CL · 2026-05-26 · unverdicted · novelty 5.0

Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.

Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study

cs.CL · 2026-06-29 · unverdicted · novelty 4.0

Case study finds that fine-tuned ASR models outperform human listeners on Dutch dysarthric continuous speech from one speaker, lowering WER from over 70% to over 23%.

The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

cs.CL · 2026-05-18 · unverdicted · novelty 4.0 · 3 refs

A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.

NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages

cs.SD · 2026-04-17

citing papers explorer

Showing 13 of 13 citing papers after filters.

Building Community-Centred NLP Resources for Puno Quechua cs.CL · 2026-05-27 · unverdicted · none · ref 3
First dedicated ASR corpus of 66 hours and systematic benchmarks for Puno Quechua using participatory collection and open release of data and fine-tuned models.
UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations eess.AS · 2026-05-18 · accept · none · ref 17
UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.
Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR eess.AS · 2026-04-10 · unverdicted · none · ref 30
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation cs.CL · 2026-04-06 · conditional · none · ref 6
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR cs.CL · 2026-05-16 · unverdicted · none · ref 13
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition cs.CL · 2026-05-05 · unverdicted · none · ref 17
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
BlasBench: An Open Benchmark for Irish Speech Recognition cs.CL · 2026-04-12 · conditional · none · ref 13
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models cs.CL · 2026-04-01 · unverdicted · none · ref 57
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.
SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors cs.CL · 2026-05-30 · unverdicted · none · ref 71
SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.
PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech cs.CL · 2026-05-26 · unverdicted · none · ref 9
Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.
Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study cs.CL · 2026-06-29 · unverdicted · none · ref 42
Case study finds that fine-tuned ASR models outperform human listeners on Dutch dysarthric continuous speech from one speaker, lowering WER from over 70% to over 23%.
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints cs.CL · 2026-05-18 · unverdicted · none · ref 78 · 3 links
A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.
NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages cs.SD · 2026-04-17 · unreviewed · ref 7

Omnilingual asr: Open-source multilingual speech recognition for 1600+ languages

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer