AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors

Arthur Pimentel; Heitor R. Guimar\~aes; Tiago Falk; Yi Zhu

arxiv: 2509.21597 · v2 · pith:TWHPHOORnew · submitted 2025-09-25 · 📡 eess.AS · cs.CL· cs.SD

AUDDT: A Unified Benchmark Toolkit for Audio and Speech Deepfake Detectors

Yi Zhu , Heitor R. Guimar\~aes , Arthur Pimentel , Tiago Falk This is my paper

classification 📡 eess.AS cs.CLcs.SD

keywords deepfakeaudioauddtdatasetsexistingtoolkitacrossconditions

0 comments

read the original abstract

With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, existing benchmarks employ a narrow set of datasets, leaving detector generalization to real-world conditions uncertain. In this paper, we systematically review 31 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (https://github.com/MuSAELab/AUDDT). The goal of this toolkit is to automate the evaluation of pretrained detectors across a wide range of speech and non-speech audio datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors under diverse manipulation types and recording conditions. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, we highlight how AUDDT differs from existing benchmarking efforts by enabling large-scale, diverse evaluation across modern spoofing methods and richer attribute-level analysis through comprehensive metadata annotation. Using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable performance variability across different conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gaps relative to practical deployment scenarios.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Alethia: A Foundational Encoder for Voice Deepfakes
cs.SD 2026-04 unverdicted novelty 6.0

Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness...