The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Han Yin; Jisheng Bai; Rohan Kumar Das; Ting Dang; Yang Xiao

arxiv: 2603.04865 · v4 · pith:B32YQHRPnew · submitted 2026-03-05 · 💻 cs.SD

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Han Yin , Yang Xiao , Rohan Kumar Das , Jisheng Bai , Ting Dang This is my paper

classification 💻 cs.SD

keywords esddchallengedeepfakedetectionenvironmentalevaluationfirstinsights

0 comments

read the original abstract

Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the ESDD challenge was launched, attracting 97 registered teams and receiving 1,748 valid submissions. This paper presents the task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the challenge results. Furthermore, we analyze common architectural choices and training strategies among top-performing systems. Finally, we discuss potential future research directions for ESDD, outlining key opportunities and open problems to guide subsequent studies in this field.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Overview of ESDD2: Environment-Aware Speech and Sound Deepfake Detection Challenge
cs.SD 2026-06 unverdicted novelty 5.0

The ESDD2 challenge evaluated 13 teams on component-level audio spoofing detection, with the top system reaching 0.8775 Macro-F1 by using modular decomposition, self-supervised encoders, and targeted augmentation.
Overview of ESDD2: Environment-Aware Speech and Sound Deepfake Detection Challenge
cs.SD 2026-06 unverdicted novelty 2.0

ESDD2 challenge overview reports top Macro-F1 of 0.8775 from 13 teams using modular designs and self-supervised encoders, with noted difficulties on environmental sounds and unseen generators.