Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

Haonnan Cheng; Jianhua Tao; Long Ye; Ruibo Fu; Xiaopeng Wang; Yuankun Xie; Zhengqi Wen; Zhiyong Wang

arxiv: 2406.03240 · v2 · pith:NFJSGK2Jnew · submitted 2024-06-05 · 💻 cs.SD · cs.AI· eess.AS

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

Yuankun Xie , Ruibo Fu , Zhengqi Wen , Zhiyong Wang , Xiaopeng Wang , Haonnan Cheng , Long Ye , Jianhua Tao This is my paper

classification 💻 cs.SD cs.AIeess.AS

keywords deepfakeaudionovelalgorithmsalgorithmchallengecurrentdetection

0 comments

read the original abstract

With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis and Fake Dispersion (REFD) strategy for audio deepfake algorithm recognition, demonstrating its effectiveness in discriminating ID samples while identifying OOD samples. For effective OOD detection, we first explore current post-hoc OOD methods and propose NSD, a novel OOD approach in identifying novel deepfake algorithms through the similarity consideration of both feature and logits scores. REFD achieves 86.83% F1-score as a single system in Audio Deepfake Detection Challenge 2023 Track3, showcasing its state-of-the-art performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dual-Branch Gated Fusion for Open-Set Audio Deepfake Source Tracing
cs.SD 2026-06 unverdicted novelty 6.0

A gated fusion of XLSR-53 and CORES features with energy margin and diversity losses reaches 97.6% ID accuracy and reduces FPR95 by 83.5% relative to the Interspeech 2025 baseline on MLAAD.