DONUT: CTC-based Query-by-Example Keyword Spotting

Loren Lugosch; Samuel Myer; Vikrant Singh Tomar

arxiv: 1811.10736 · v1 · pith:ZGGQNHAWnew · submitted 2018-11-26 · 💻 cs.LG · cs.SD· eess.AS· stat.ML

DONUT: CTC-based Query-by-Example Keyword Spotting

Loren Lugosch , Samuel Myer , Vikrant Singh Tomar This is my paper

classification 💻 cs.LG cs.SDeess.ASstat.ML

keywords keywordwakewordctc-baseddonutquery-by-examplespottingalgorithmcustom

0 comments

read the original abstract

Keyword spotting--or wakeword detection--is an essential feature for hands-free operation of modern voice-controlled devices. With such devices becoming ubiquitous, users might want to choose a personalized custom wakeword. In this work, we present DONUT, a CTC-based algorithm for online query-by-example keyword spotting that enables custom wakeword detection. The algorithm works by recording a small number of training examples from the user, generating a set of label sequence hypotheses from these training examples, and detecting the wakeword by aggregating the scores of all the hypotheses given a new audio recording. Our method combines the generalization and interpretability of CTC-based keyword spotting with the user-adaptation and convenience of a conventional query-by-example system. DONUT has low computational requirements and is well-suited for both learning and inference on embedded systems without requiring private user data to be uploaded to the cloud.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation
eess.AS 2026-05 unverdicted novelty 5.0

DMA-KWS achieves 97.85% AUC and 6.13% EER on LibriPhrase Hard via dual-stage CTC/QbyT matching, multi-modal enrollment, and lightweight continual adaptation with 187k parameters.