First dedicated ASR corpus of 66 hours and systematic benchmarks for Puno Quechua using participatory collection and open release of data and fine-tuned models.
hub
Omnilingual asr: Open-source multilingual speech recognition for 1600+ languages
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 14roles
method 1polarities
use method 1representative citing papers
UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
MADE is a new multilingual agentic diagnosing engine that produces higher-quality diagnostic reports (47% better than baseline) on a large-scale evaluation substrate covering 33 model families and 26 languages.
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.
SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.
Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.
Case study finds that fine-tuned ASR models outperform human listeners on Dutch dysarthric continuous speech from one speaker, lowering WER from over 70% to over 23%.
A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.
citing papers explorer
-
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.