Personalized Keyword Spotting for User-Defined Keywords Leveraging Text-Independent Speaker Verification

· 2026 · eess.AS · arXiv 2606.20106

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

User-defined keyword spotting (UD-KWS) enables zero-shot wake-word detection from text, but existing systems learn speaker-invariant representations that cannot reject impostors uttering the correct keyword. We address this dual zero-shot setting -- unseen keywords and unseen speakers -- with ZP-KWS, a lightweight framework combining a phoneme-supervised audio encoder with a GE2E-pretrained compact speaker encoder (about 0.9M parameters). Multiplicative late fusion at inference grants each branch independent veto power, supporting modes from conventional detection to strict speaker-gated activation without retraining. On LibriPhrase, Google Speech Commands, and Qualcomm datasets, ZP-KWS reduces target-only FRR at 1% FAR by up to 60% relative to the strongest baseline while maintaining competitive keyword detection, all within a 1.55M parameter budget for edge deployment.

representative citing papers

Personalized Keyword Spotting for User-Defined Keywords Leveraging Text-Independent Speaker Verification

eess.AS · 2026-06-18 · unverdicted · novelty 5.0

ZP-KWS combines a phoneme-supervised audio encoder with a 0.9M-parameter GE2E speaker encoder and multiplicative late fusion to cut target-only FRR at 1% FAR by up to 60% on LibriPhrase, Google Speech Commands, and Qualcomm datasets while staying under 1.55M total parameters.

citing papers explorer

Showing 1 of 1 citing paper.

Personalized Keyword Spotting for User-Defined Keywords Leveraging Text-Independent Speaker Verification eess.AS · 2026-06-18 · unverdicted · none · ref 2 · internal anchor
ZP-KWS combines a phoneme-supervised audio encoder with a 0.9M-parameter GE2E speaker encoder and multiplicative late fusion to cut target-only FRR at 1% FAR by up to 60% on LibriPhrase, Google Speech Commands, and Qualcomm datasets while staying under 1.55M total parameters.

Personalized Keyword Spotting for User-Defined Keywords Leveraging Text-Independent Speaker Verification

fields

years

verdicts

representative citing papers

citing papers explorer