First dedicated ASR corpus of 66 hours and systematic benchmarks for Puno Quechua using participatory collection and open release of data and fine-tuned models.
hub
Omnilingual asr: Open-source multilingual speech recognition for 1600+ languages
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 13roles
method 1polarities
use method 1representative citing papers
UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.
SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.
Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.
Case study finds that fine-tuned ASR models outperform human listeners on Dutch dysarthric continuous speech from one speaker, lowering WER from over 70% to over 23%.
A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.
citing papers explorer
-
Building Community-Centred NLP Resources for Puno Quechua
First dedicated ASR corpus of 66 hours and systematic benchmarks for Puno Quechua using participatory collection and open release of data and fine-tuned models.
-
UrduSpeech: A 156-Hour Urdu Speech Corpus with 12-Dimension Paralinguistic Annotations
UrduSpeech is a 156-hour high-fidelity Urdu speech corpus with 12-dimension paralinguistic annotations, a 9-hour manually corrected benchmark, and open-source release to support speech technology for an under-resourced language.
-
Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
-
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation
Multilingual ASR models show 39.7-297% zero-shot WER on Pashto public data, Whisper models output correct script in under 0.8% of cases, and fine-tuned models degrade to 32.5-59% WER on out-of-domain sets.
-
JSPG: Dynamic Dictionary Filtering via Joint Semantic-Pinyin-Glyph Retrieval for Chinese Contextual ASR
JSPG jointly combines semantic, pinyin, and glyph retrieval with an extended Smith-Waterman algorithm to dynamically filter keyword dictionaries and improve accuracy in Chinese contextual ASR.
-
AfriVox-v2: A Domain-Verticalized Benchmark for In-the-Wild African Speech Recognition
AfriVox-v2 is a benchmark that evaluates modern speech models on in-the-wild African audio with domain-specific tests for sectors including government, finance, health, and agriculture.
-
BlasBench: An Open Benchmark for Irish Speech Recognition
BlasBench supplies an Irish-aware normalizer and scoring harness that enables reproducible ASR comparisons and exposes a 33-43 point generalization gap for fine-tuned models versus 7-10 points for massively multilingual ones.
-
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
OmniVoice introduces a diffusion language model-style non-autoregressive TTS system that directly maps text to multi-codebook acoustic tokens, scaling zero-shot synthesis to over 600 languages with SOTA results on multilingual benchmarks using 581k hours of open data.
-
SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors
SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.
-
PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech
Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.
-
Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study
Case study finds that fine-tuned ASR models outperform human listeners on Dutch dysarthric continuous speech from one speaker, lowering WER from over 70% to over 23%.
-
The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints
A narrative survey of low-resource NLP evaluation identifies the Annotation Scarcity Paradox as a structural mismatch between scalable models and scarce sociolinguistic evaluation capacity.
- NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages