Distil-whisper: Robust knowledge distillation via large-scale pseudo labelling

Sanchit Gandhi, Patrick von Platen, Alexander M · 2023 · arXiv 2311.00430

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

MURMUR: An Efficient Inference System for Long-Form ASR

cs.LG · 2026-05-31 · conditional · novelty 6.0

Murmur matches single-pass long-context ASR accuracy on AMI-IHM while cutting latency 4.2x by tuning chunk size and using intra-chunk attention sparsity via KV eviction.

Logit Distillation on Manifolds: Mapping by Learning

cs.LG · 2026-05-30 · unverdicted · novelty 3.0

Presents a layer- and point-wise projection mapping for manifold-based logit distillation combined with LoRA to enable low-parameter student training with reported WER gains.

Data-Efficient On-Policy Distillation for Automatic Speech Recognition

cs.AI · 2026-05-27 · unverdicted · novelty 3.0

On-policy distillation from a Qwen-ASR teacher improves a 0.6B Ark-ASR model over supervised fine-tuning and a same-scale baseline on four of five ASR benchmarks using 100k hours of speech.

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

eess.AS · 2026-05-12 · unverdicted · novelty 3.0

Modern ASR models with noisy training and language models correlate better with human WER for speech enhancement evaluation than simpler models, yet their robustness makes them less suitable for purely acoustic assessments.

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

cs.CV · 2026-04-09

citing papers explorer

Showing 5 of 5 citing papers after filters.

MURMUR: An Efficient Inference System for Long-Form ASR cs.LG · 2026-05-31 · conditional · none · ref 21
Murmur matches single-pass long-context ASR accuracy on AMI-IHM while cutting latency 4.2x by tuning chunk size and using intra-chunk attention sparsity via KV eviction.
Logit Distillation on Manifolds: Mapping by Learning cs.LG · 2026-05-30 · unverdicted · none · ref 1
Presents a layer- and point-wise projection mapping for manifold-based logit distillation combined with LoRA to enable low-parameter student training with reported WER gains.
Data-Efficient On-Policy Distillation for Automatic Speech Recognition cs.AI · 2026-05-27 · unverdicted · none · ref 6
On-policy distillation from a Qwen-ASR teacher improves a 0.6B Ark-ASR model over supervised fine-tuning and a same-scale baseline on four of five ASR benchmarks using 100k hours of speech.
Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement eess.AS · 2026-05-12 · unverdicted · none · ref 39
Modern ASR models with noisy training and language models correlate better with human WER for speech enhancement evaluation than simpler models, yet their robustness makes them less suitable for purely acoustic assessments.
SenBen: Sensitive Scene Graphs for Explainable Content Moderation cs.CV · 2026-04-09 · unreviewed · ref 8

Distil-whisper: Robust knowledge distillation via large-scale pseudo labelling

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer