ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

Andreas Nautsch; H\'ector Delgado; Jose Patino; Junichi Yamagishi; Kong Aik Lee; Massimiliano Todisco; Md Sahidullah; Nicholas Evans; Tomi Kinnunen; Xin Wang

arxiv: 2109.00537 · v1 · pith:TEZK456Knew · submitted 2021-09-01 · 📡 eess.AS · cs.CR· cs.LG· cs.SD

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

Junichi Yamagishi , Xin Wang , Massimiliano Todisco , Md Sahidullah , Jose Patino , Andreas Nautsch , Xuechen Liu , Kong Aik Lee

show 3 more authors

Tomi Kinnunen Nicholas Evans H\'ector Delgado

This is my paper

classification 📡 eess.AS cs.CRcs.LGcs.SD

keywords asvspoofdeepfakeresultsaccessphysicalspeechtaskschallenge

0 comments

read the original abstract

ASVspoof 2021 is the forth edition in the series of bi-annual challenges which aim to promote the study of spoofing and the design of countermeasures to protect automatic speaker verification systems from manipulation. In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection. This paper describes all three tasks, the new databases for each of them, the evaluation metrics, four challenge baselines, the evaluation platform and a summary of challenge results. Despite the introduction of channel and compression variability which compound the difficulty, results for the logical access and deepfake tasks are close to those from previous ASVspoof editions. Results for the physical access task show the difficulty in detecting attacks in real, variable physical spaces. With ASVspoof 2021 being the first edition for which participants were not provided with any matched training or development data and with this reflecting real conditions in which the nature of spoofed and deepfake speech can never be predicated with confidence, the results are extremely encouraging and demonstrate the substantial progress made in the field in recent years.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Watermark Shortcut: How Provenance Marking Sabotages Audio Deepfake Detection
cs.SD 2026-06 unverdicted novelty 8.0

Watermarking only synthetic audio leads deepfake detectors to use the watermark as a spurious shortcut, causing generalization failure, evasion by removing watermarks, and false positives on watermarked real audio.
Linguistically Augmented Audio Speech Data (LinguAS)
cs.SD 2026-06 unverdicted novelty 7.0

Introduces the LinguAS dataset of genuine and deepfaked audio annotated with expert-defined linguistic features to improve detection model performance over ASVspoof 2021 and SSL baselines.
EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
eess.AS 2025-10 unverdicted novelty 7.0

EchoFake is a new replay-aware dataset combining zero-shot TTS deepfakes and physical replay recordings to improve generalization of speech deepfake detection models over existing lab-focused datasets.
Alethia: A Foundational Encoder for Voice Deepfakes
cs.SD 2026-04 unverdicted novelty 6.0

Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness...
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
cs.SD 2024-01 unverdicted novelty 6.0

MLAAD provides a large-scale multi-language synthetic audio dataset for training and evaluating audio anti-spoofing models, showing better training performance than InTheWild and FakeOrReal and alternating superiority...
Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis
cs.SD 2026-03 unverdicted novelty 5.0

Fairness metrics uncover gender disparities in audio deepfake detection error distributions that standard Equal Error Rate metrics obscure.
Detecting Audio Deepfakes on the Edge:Lightweight SSL-Based Detection in a Browser Plugin
eess.AS 2026-06 unverdicted novelty 4.0

Truncated SSL backbone with logistic classifier detects audio deepfakes on-device, claimed to outperform AASIST by 10% while running 40% faster, packaged as a browser plugin.
SpAArSIST: Sparsified AASIST for Efficient and Reliable Anti-Spoofing
cs.SD 2026-06 conditional novelty 3.0

SpAArSIST sparsifies AASIST by swapping learned pooling for explicit magnitude-based scoring and mean aggregation, cutting compute 20.7% and improving In-the-Wild EER to 2.82%.