A dual-branch fusion model with XLS-R, BEATs, Matching Head, and cross-attention achieves 70.20% F1-score and 16.54% environmental EER on CompSpoofV2, outperforming the baseline for component-level deepfake detection.
Add 2022: the first audio deep synthesis detection challenge,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Deepfake Audio Detection Using Self-supervised Fusion Representations
A dual-branch fusion model with XLS-R, BEATs, Matching Head, and cross-attention achieves 70.20% F1-score and 16.54% environmental EER on CompSpoofV2, outperforming the baseline for component-level deepfake detection.