Murmur matches single-pass long-context ASR accuracy on AMI-IHM while cutting latency 4.2x by tuning chunk size and using intra-chunk attention sparsity via KV eviction.
Distil-whisper: Robust knowledge distillation via large-scale pseudo labelling
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
method 1polarities
use method 1representative citing papers
Presents a layer- and point-wise projection mapping for manifold-based logit distillation combined with LoRA to enable low-parameter student training with reported WER gains.
On-policy distillation from a Qwen-ASR teacher improves a 0.6B Ark-ASR model over supervised fine-tuning and a same-scale baseline on four of five ASR benchmarks using 100k hours of speech.
Modern ASR models with noisy training and language models correlate better with human WER for speech enhancement evaluation than simpler models, yet their robustness makes them less suitable for purely acoustic assessments.
citing papers explorer
-
MURMUR: An Efficient Inference System for Long-Form ASR
Murmur matches single-pass long-context ASR accuracy on AMI-IHM while cutting latency 4.2x by tuning chunk size and using intra-chunk attention sparsity via KV eviction.
-
Logit Distillation on Manifolds: Mapping by Learning
Presents a layer- and point-wise projection mapping for manifold-based logit distillation combined with LoRA to enable low-parameter student training with reported WER gains.
-
Data-Efficient On-Policy Distillation for Automatic Speech Recognition
On-policy distillation from a Qwen-ASR teacher improves a 0.6B Ark-ASR model over supervised fine-tuning and a same-scale baseline on four of five ASR benchmarks using 100k hours of speech.
-
Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement
Modern ASR models with noisy training and language models correlate better with human WER for speech enhancement evaluation than simpler models, yet their robustness makes them less suitable for purely acoustic assessments.
- SenBen: Sensitive Scene Graphs for Explainable Content Moderation