CleanCodec reframes audio tokenization as a selective information bottleneck to encode only perceptually important features at 12.5 tokens per second, outperforming prior codecs in efficiency, speaker similarity, and intelligibility.
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Listeners detect automatic anonymization in pathological speech at 91-93% accuracy with a 30-point perceived quality drop, yet clinical severity ratings stay nearly unchanged for dysarthria, dysglossia, and dysphonia.
Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.
HydraQE is a new end-to-end speech translation QE system using Qwen3-ASR backbone, sparsemax layer mixing, bidirectional Transformer, and multi-task curriculum training on human and pseudo labels that outperforms cascaded baselines.
The study shows clinical AI accuracy collapsing from 89% to 62% on X-rays under imperceptible adversarial perturbations and from 85% to 55% on clinical cases in Nigerian Pidgin and Yoruba-inflected English.
A survey catalogs text and speech resources for Hausa and Fongbe, documenting sizes, domains, licensing, and gaps including limited Fongbe text diversity and missing Hausa speech corpora.
citing papers explorer
-
Perceptual implications of automatic anonymization in pathological speech
Listeners detect automatic anonymization in pathological speech at 91-93% accuracy with a 30-point perceived quality drop, yet clinical severity ratings stay nearly unchanged for dysarthria, dysglossia, and dysphonia.