An end-to-end SLU architecture with frozen SSL acoustic encoder, LSTM classification head, and cross-modal distillation achieves 93% accuracy on simple commands and 82% on spontaneous speech at 7 ms latency on the new VoiceStick corpus, outperforming cascade baselines.
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
eess.AS 2years
2026 2representative citing papers
SEAM achieves 0.971 ROC-AUC on external interview data for real-time scripted speech detection by combining shortcut-prevention data techniques with a compact audio backbone.
citing papers explorer
-
SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails
SEAM achieves 0.971 ROC-AUC on external interview data for real-time scripted speech detection by combining shortcut-prevention data techniques with a compact audio backbone.