V.O.I.C.E is a new taxonomy that organizes synthetic voice risks into five categories and shows how they interact with exposure, visibility, and legal context using empirical incident data.
Does Audio Deepfake Detection Generalize?
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
Spoof-SUPERB benchmark shows large-scale discriminative SSL models such as XLS-R, UniSpeech-SAT, and WavLM Large outperform others in audio deepfake detection and maintain robustness under acoustic degradations.
EchoFake is a new replay-aware dataset combining zero-shot TTS deepfakes and physical replay recordings to improve generalization of speech deepfake detection models over existing lab-focused datasets.
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
ProSDD learns speaker-conditioned prosodic variation from real speech via supervised masked prediction and jointly optimizes it with spoof detection, cutting EER substantially on ASVspoof 2024 and emotional datasets.
Cosine similarity in SupCon with a delayed negative queue on wav2vec2 XLS-R yields the lowest equal error rates for deepfake audio detection on in-the-wild and pooled evaluations.
citing papers explorer
-
V.O.I.C.E (Voice, Ownership, Identity, Control, Expression): Risk Taxonomy of Synthetic Voice Generation From Empirical Data
V.O.I.C.E is a new taxonomy that organizes synthetic voice risks into five categories and shows how they interact with exposure, visibility, and legal context using empirical incident data.
-
A SUPERB-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection
Spoof-SUPERB benchmark shows large-scale discriminative SSL models such as XLS-R, UniSpeech-SAT, and WavLM Large outperform others in audio deepfake detection and maintain robustness under acoustic degradations.
-
EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
EchoFake is a new replay-aware dataset combining zero-shot TTS deepfakes and physical replay recordings to improve generalization of speech deepfake detection models over existing lab-focused datasets.
-
Asymmetric Phase Coding Audio Watermarking
APC embeds compact Ed25519 signatures into audio phase data with error correction to achieve 97.5-98.3% cryptographic verification under eight attack types at mean PESQ 3.02.
-
ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks
ProSDD learns speaker-conditioned prosodic variation from real speech via supervised masked prediction and jointly optimizes it with spoof detection, cutting EER substantially on ASVspoof 2024 and emotional datasets.
-
Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection
Cosine similarity in SupCon with a delayed negative queue on wav2vec2 XLS-R yields the lowest equal error rates for deepfake audio detection on in-the-wild and pooled evaluations.