PVP models speaker-specific phoneme acoustic distributions with lightweight GMMs trained only on real speech to detect deepfakes of persons-of-interest, outperforming generic detectors and introducing a new Chinese POI dataset.
Wavlm: Large-scale self-supervised pre-training for full stack speech processing.IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Profiling the Voice: Speaker-Specific Phoneme Fingerprinting for Speech Deepfake Detection
PVP models speaker-specific phoneme acoustic distributions with lightweight GMMs trained only on real speech to detect deepfakes of persons-of-interest, outperforming generic detectors and introducing a new Chinese POI dataset.