ProSDD learns speaker-conditioned prosodic variation from real speech via supervised masked prediction and jointly optimizes it with spoof detection, cutting EER substantially on ASVspoof 2024 and emotional datasets.
Aasist: Audio anti-spoofing using in- tegrated spectro-temporal graph attention networks,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Diffusion reconstruction creates hard samples for audio deepfake detection training, and when paired with feature aggregation and RACL, it reduces average EER versus baselines.
citing papers explorer
-
ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks
ProSDD learns speaker-conditioned prosodic variation from real speech via supervised masked prediction and jointly optimizes it with spoof detection, cutting EER substantially on ASVspoof 2024 and emotional datasets.
-
Diffusion Reconstruction towards Generalizable Audio Deepfake Detection
Diffusion reconstruction creates hard samples for audio deepfake detection training, and when paired with feature aggregation and RACL, it reduces average EER versus baselines.