UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.
Omnilingual asr: Open-source multilingual speech recognition for 1600+ languages
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9roles
method 1polarities
use method 1representative citing papers
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.
Introduces the Annotation Scarcity Paradox to describe how model scaling in low-resource NLP outpaces the human expertise required for authentic evaluation, threatening the validity of reported progress.
citing papers explorer
-
UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations
UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.
-
Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
-
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
-
JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
-
AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
-
BlasBench: An Open Benchmark for Irish Speech Recognition
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
-
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.
-
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
Introduces the Annotation Scarcity Paradox to describe how model scaling in low-resource NLP outpaces the human expertise required for authentic evaluation, threatening the validity of reported progress.
- NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages