A gated fusion of XLSR-53 and CORES features with energy margin and diversity losses reaches 97.6% ID accuracy and reduces FPR95 by 83.5% relative to the Interspeech 2025 baseline on MLAAD.
Synthetic speech source tracing using metric learning,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Dual-Branch Gated Fusion for Open-Set Audio Deepfake Source Tracing
A gated fusion of XLSR-53 and CORES features with energy margin and diversity losses reaches 97.6% ID accuracy and reduces FPR95 by 83.5% relative to the Interspeech 2025 baseline on MLAAD.