Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing

Jagabandhu Mishra; Manasi Chhibber; Tomi H. Kinnunen

arxiv: 2509.24674 · v2 · submitted 2025-09-29 · 📡 eess.AS

Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing

Manasi Chhibber , Jagabandhu Mishra , Tomi H. Kinnunen This is my paper

Pith reviewed 2026-05-18 12:41 UTC · model grok-4.3

classification 📡 eess.AS

keywords deepfake source tracingzero-shot learningopen-set recognitionspeech deepfakesattack verificationAASIST embeddingsSTOPA datasetequal error rate

0 comments

The pith

Zero-shot cosine scoring outperforms few-shot methods for out-of-distribution deepfake source tracing while few-shot leads on in-distribution trials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework for tracing the source of speech deepfakes even when the attack type is unseen during training. It adapts SSL-AASIST embeddings using angular additive margin loss and RegMixup while enforcing complete separation between training attacks and those used to form fingerprint-trial pairs. Experiments on the STOPA dataset in an open-set setting demonstrate that few-shot Siamese and MLP backends achieve lower equal error rates on in-distribution trials, but zero-shot cosine similarity performs better on out-of-distribution trials.

Core claim

The adapted SSL-AASIST embeddings support open-set attack source verification, with few-shot Siamese and MLP reaching EERs of 17.72% and 13.11% on ID trials compared to 29.91% for zero-shot cosine scoring, while zero-shot cosine scoring reaches 16.43% EER on OOD trials, outperforming few-shot Siamese at 23.47% and MLP at 21.57%.

What carries the argument

Adapted SSL-AASIST embeddings enhanced with AAM loss and RegMixup for attack classification, paired with zero-shot cosine or few-shot Siamese and MLP scoring backends for verification.

If this is right

Few-shot backends should be selected for source tracing when test attacks match the training distribution.
Zero-shot cosine scoring is preferable when encountering entirely new attack types.
Maintaining attack disjointness during training is necessary to validate generalization in open-set conditions.
Hybrid systems could switch between zero-shot and few-shot scoring depending on observed distribution shift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world deployment may require automatic detection of whether a new deepfake belongs to the known or unknown attack distribution.
The approach could extend to attributing deepfakes to specific generation tools beyond the current dataset.
Combining embedding adaptation with distribution-aware backend selection might improve robustness across evolving attack landscapes.

Load-bearing premise

Training attacks can be kept completely disjoint from fingerprint-trial pairs while the embeddings still generalize to trace unseen attack sources.

What would settle it

Zero-shot cosine scoring failing to achieve lower EER than few-shot methods in OOD trials on an independent dataset with new disjoint attacks.

read the original abstract

We propose a novel zero-shot source tracing framework inspired by speaker verification. We adapt SSL-AASIST for attack classification, enhancing embeddings with AAM loss and RegMixup, and ensure that training attacks are disjoint from those forming fingerprint-trial pairs. For backend scoring in attack verification, we explore both zero-shot approaches (cosine similarity and Siamese) and few-shot approaches (MLP and Siamese). Experiments on our recently introduced STOPA dataset with an open set setting show that few-shot learning provides advantages in the in-distribution (ID) scenario, while zero-shot approaches perform better in the out-of-distribution (OOD) scenario. In attack source verification with ID trials, few-shot Siamese and MLP achieve equal error rates (EER) of 17.72% and 13.11%, compared to 29.91% for zero-shot cosine scoring. Conversely, in OOD trials, zero-shot cosine scoring reaches 16.43%, outperforming few-shot Siamese at 23.47% and MLP at 21.57%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports a useful empirical split where few-shot backends win on in-distribution deepfake source tracing but zero-shot cosine scoring wins out-of-distribution, yet the open-set claim depends on an unverified disjointness assumption that may not fully hold.

read the letter

The main point is that this work shows few-shot MLP and Siamese backends reaching 13.11% and 17.72% EER on in-distribution attack source verification trials, while zero-shot cosine scoring drops to 16.43% EER on out-of-distribution trials and beats the few-shot options there. They adapt SSL-AASIST with additive angular margin loss and RegMixup, then enforce a training protocol where the attacks used for embedding learning stay separate from the fingerprint-trial pairs. This produces a clean comparison of zero-shot versus few-shot scoring on the STOPA dataset under an explicit open-set setup. The numbers are concrete and the ID/OOD split is a reasonable way to probe generalization, so the experimental design gives a practical baseline for anyone working on audio attribution. The extension of speaker verification tools to deepfake source tracing is straightforward and the disjoint-attack condition is stated clearly enough to replicate the protocol. Credit for shipping specific EER figures on both splits rather than vague claims. The soft spot is the leakage risk the stress-test note flags. Even with training attacks held out, the upstream self-supervised model was pretrained on broad audio corpora that could contain similar synthesis artifacts or generator families, and the adaptation step might pick up indirect cues. The paper asserts disjointness but does not appear to include checks such as generator overlap analysis or artifact similarity tests, so the OOD advantage for zero-shot scoring could partly reflect residual exposure rather than pure open-set behavior. Without those controls or error bars on the reported EERs, the generalization story is harder to trust at face value. This is for audio forensics groups or misinformation researchers who need working baselines for source tracing rather than theoretical advances. A reader already familiar with AASIST-style models and open-set verification would get the most out of the backend comparisons. It deserves peer review because the protocol is explicit and the results are falsifiable, even if the leakage concern needs tighter experimental handling in revision.

Referee Report

1 major / 2 minor

Summary. The paper proposes a zero-shot open-set source tracing framework for speech deepfakes, adapting SSL-AASIST embeddings with AAM loss and RegMixup while keeping training attacks disjoint from fingerprint-trial pairs. It evaluates zero-shot (cosine similarity, Siamese) and few-shot (MLP, Siamese) backends for attack verification on the STOPA dataset under open-set conditions, claiming that few-shot methods yield lower EER in ID trials (MLP at 13.11%, Siamese at 17.72%) while zero-shot cosine scoring outperforms in OOD trials (16.43% vs. 21.57-23.47%).

Significance. If the strict disjointness and absence of leakage hold, the work offers a useful empirical comparison showing that backend choice should depend on whether the scenario is ID or OOD, advancing forensic tools for deepfake attribution beyond closed-set assumptions.

major comments (1)

[Experimental setup / attack selection] Experimental setup (abstract and methods description of attack selection): The central open-set claim rests on the assertion that training attacks are kept completely disjoint from those forming fingerprint-trial pairs. However, no explicit validation, overlap analysis, or checks against indirect exposure via SSL pretraining data or shared synthesis artifacts in the adaptation stage are provided. This directly affects whether the reported EER gaps (e.g., few-shot 13.11% ID vs. zero-shot 16.43% OOD) demonstrate true generalization rather than partial leakage.

minor comments (2)

[Results] Results section: The reported EER values lack error bars, confidence intervals, or statistical significance tests, making it difficult to assess the robustness of the ID/OOD performance differences.
[Methods] Methods: No ablation studies are described that isolate the contributions of AAM loss versus RegMixup to the embedding quality or downstream verification performance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback. We address the concern about validating the disjointness of training attacks in the experimental setup below, and we will incorporate clarifications to strengthen the open-set claims.

read point-by-point responses

Referee: [Experimental setup / attack selection] Experimental setup (abstract and methods description of attack selection): The central open-set claim rests on the assertion that training attacks are kept completely disjoint from those forming fingerprint-trial pairs. However, no explicit validation, overlap analysis, or checks against indirect exposure via SSL pretraining data or shared synthesis artifacts in the adaptation stage are provided. This directly affects whether the reported EER gaps (e.g., few-shot 13.11% ID vs. zero-shot 16.43% OOD) demonstrate true generalization rather than partial leakage.

Authors: We agree that explicit documentation of the attack disjointness is essential to substantiate the open-set evaluation. In the revised manuscript, we will expand the Methods section with a dedicated subsection on attack selection. This will include: (1) an explicit enumeration of the specific deepfake generation methods (e.g., by name or reference to STOPA categories) assigned to the training set versus those reserved exclusively for constructing fingerprint-trial pairs in both ID and OOD partitions; (2) confirmation that the partitions were constructed to ensure zero overlap at the attack-instance level. Regarding indirect exposure, the SSL-AASIST backbone was pre-trained on ASVspoof 2019 LA, whose synthesis algorithms (e.g., conventional TTS/VC) differ from the modern neural vocoders and diffusion-based methods in STOPA; we will add a short paragraph stating this distinction and noting that no STOPA attacks appear in the pre-training corpus. For shared synthesis artifacts during adaptation, the combination of AAM loss and RegMixup encourages the embeddings to capture attack-specific discriminative cues rather than generic artifacts; we will report a supplementary cosine-similarity analysis between training and held-out attack embeddings to quantify any residual overlap. These additions will directly address the leakage concern while preserving the reported EER comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison on held-out disjoint data

full rationale

The manuscript reports experimental EER results from adapting SSL-AASIST embeddings (with AAM loss and RegMixup) and comparing zero-shot versus few-shot backends on the STOPA dataset under an explicitly stated disjoint training-attack condition for ID and OOD trials. No equations, predictions, or first-principles derivations are present that reduce by construction to fitted parameters or self-citations; the central claims are direct outcome measurements on held-out pairs, rendering the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the domain-specific choice to enforce disjoint training and test attacks; no new entities are postulated and hyperparameters such as loss weights are implicit but not enumerated as free parameters here.

axioms (2)

domain assumption SSL-AASIST embeddings can be enhanced for attack classification via AAM loss and RegMixup while preserving generalization to unseen attacks.
Invoked in the adaptation step described in the abstract.
domain assumption Training attacks remain completely disjoint from fingerprint-trial pairs in the open-set evaluation.
Stated explicitly as a requirement for the zero-shot setup.

pith-pipeline@v0.9.0 · 5720 in / 1460 out tokens · 32213 ms · 2026-05-18T12:41:06.920821+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing

INTRODUCTION “Trust, once lost, is not easily regained. ”Advances in neural speech synthesis and voice conversion now enable the creation of highly re- alistic spoofed speech [1]. Such speech is often indistinguishable from bonafide human speech, both for listeners and for automatic systems [2]. The research community has responded with increas- ingly pow...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Attack Source Verification Source tracingaims to identify or verify the source of a spoofing attack given an unknown utterancex

OPEN-SET ATTACK SOURCE VERIFICATION 2.1. Attack Source Verification Source tracingaims to identify or verify the source of a spoofing attack given an unknown utterancex. In anidentificationsetting, a systemF I predicts the source as ˆk= arg max k∈AID FI (x)k,(1) whereA ID denotes the set of in-distribution (seen or known) attacks. In theclosed-setcase, th...

work page
[3]

Database We conduct experiments primarily on the recent, publicly available STOPA [14] dataset

EXPERIMENTAL SETUP, RESULTS AND DISCUSSION 3.1. Database We conduct experiments primarily on the recent, publicly available STOPA [14] dataset. It contains699k spoofed utterances from13 attack systems, formed by combining8acoustic models (AMs) and6vocoder models (VMs). Each utterance is labeled with its attack id as well as AM and VM ids, enabling multi-l...

work page 2019
[4]

Specifically, we enhanced SSL-AASIST embed- dings with AAM loss and incorporated out-of-domain data to im- prove variability and robustness

CONCLUSION We addressed a realistic and challengingopen-set, zero-shot source tracingscenario. Specifically, we enhanced SSL-AASIST embed- dings with AAM loss and incorporated out-of-domain data to im- prove variability and robustness. In zero-shot tracing, cosine sim- ilarity generalized best to unseen attacks, while few-shot backends (MLP, Siamese) prov...

work page
[5]

Audio deepfake detection: A survey,

Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, and Yan Zhao, “Audio deepfake detection: A survey,”arXiv preprint arXiv:2308.14970, 2023

work page arXiv 2023
[6]

A survey on speech deepfake detection,

Menglu Li, Yasaman Ahmadiadli, and Xiao-Ping Zhang, “A survey on speech deepfake detection,”ACM Computing Sur- veys, vol. 57, no. 7, pp. 1–38, 2025

work page 2025
[7]

Source tracing of audio deepfake systems,

Nicholas Klein, Tianxiang Chen, Hemlata Tak, Ricardo Casal, and Elie Khoury, “Source tracing of audio deepfake systems,” arXiv preprint arXiv:2407.08016, 2024

work page arXiv 2024
[8]

Source tracing: detecting voice spoofing,

Tinglong Zhu, Xingming Wang, Xiaoyi Qin, and Ming Li, “Source tracing: detecting voice spoofing,” in2022 Asia- Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022, pp. 216– 220

work page 2022
[9]

Audio deepfake source tracing using multi-attribute open-set identification and verification,

Pierre Falez, Tony Marteau, Damien Lolive, and Arnaud Del- hay, “Audio deepfake source tracing using multi-attribute open-set identification and verification,” inProc. Interspeech 2025, 2025, pp. 1528–1532

work page 2025
[10]

Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion,

Ajinkya Kulkarni, Sandipana Dowerah, Tanel Alum ¨ae, and Mathew Magimai Doss, “Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion,” inInterspeech 2025, 2025, pp. 1533– 1537

work page 2025
[11]

TADA: Training-free Attribution and Out-of-Domain Detec- tion of Audio Deepfakes,

Adriana Stan, David Combei, Dan Oneata, and Horia Cucu, “TADA: Training-free Attribution and Out-of-Domain Detec- tion of Audio Deepfakes,” inInterspeech 2025, 2025, pp. 1543–1547

work page 2025
[12]

Open-Set Source Tracing of Audio Deepfake Systems,

Nicholas Klein, Hemlata Tak, and Elie Khoury, “Open-Set Source Tracing of Audio Deepfake Systems,” inInterspeech 2025, 2025, pp. 1578–1582

work page 2025
[13]

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incre- mental Learning Method for Audio Deepfake Source Tracing,

Yang Xiao and Rohan Kumar Das, “Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incre- mental Learning Method for Audio Deepfake Source Tracing,” inInterspeech 2025, 2025, pp. 1563–1567

work page 2025
[14]

VIB- based Real Pre-emphasis Audio Deepfake Source Tracing,

Thien-Phuc Doan, Kihun Hong, and Souhwan Jung, “VIB- based Real Pre-emphasis Audio Deepfake Source Tracing,” in Interspeech 2025, 2025, pp. 1568–1572

work page 2025
[15]

Synthetic Speech Source Trac- ing using Metric Learning,

Dimitrios Koutsianos, Stavros Zacharopoulos, Yannis Pana- gakis, and Themos Stafylakis, “Synthetic Speech Source Trac- ing using Metric Learning,” inInterspeech 2025, 2025, pp. 1558–1562

work page 2025
[16]

Source Verification for Speech Deepfakes ,

Viola Negroni, Davide Salvi, Paolo Bestagini, and Stefano Tubaro, “ Source Verification for Speech Deepfakes ,” inIn- terspeech 2025, 2025, pp. 1548–1552

work page 2025
[17]

Deep residual learning for image recognition,

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016
[18]

STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution,

Anton Firc, Manasi Chhibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, and Kamil Malinka, “STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution,” inInterspeech 2025, 2025, pp. 1553–1557

work page 2025
[19]

Investigating self- supervised front ends for speech spoofing countermeasures,

Xin Wang and Junichi Yamagishi, “Investigating self- supervised front ends for speech spoofing countermeasures,” arXiv preprint arXiv:2111.07725, 2021

work page arXiv 2021
[20]

Arcface: Additive angular margin loss for deep face recogni- tion,

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou, “Arcface: Additive angular margin loss for deep face recogni- tion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4690–4699

work page 2019
[21]

Momentum contrast for unsupervised visual represen- tation learning,

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Gir- shick, “Momentum contrast for unsupervised visual represen- tation learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729– 9738

work page 2020
[22]

Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,

Xin Wang, Junichi Yamagishi, Massimiliano Todisco, H ´ector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidul- lah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, et al., “Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,”Computer Speech & Lan- guage, vol. 64, pp. 101114, 2020

work page 2019
[23]

Automatic speaker verification spoofing and deep- fake detection using wav2vec 2.0 and data augmentation

Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, and Nicholas Evans, “Auto- matic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation,”arXiv preprint arXiv:2202.12233, 2022

work page arXiv 2022

[1] [1]

Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing

INTRODUCTION “Trust, once lost, is not easily regained. ”Advances in neural speech synthesis and voice conversion now enable the creation of highly re- alistic spoofed speech [1]. Such speech is often indistinguishable from bonafide human speech, both for listeners and for automatic systems [2]. The research community has responded with increas- ingly pow...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Attack Source Verification Source tracingaims to identify or verify the source of a spoofing attack given an unknown utterancex

OPEN-SET ATTACK SOURCE VERIFICATION 2.1. Attack Source Verification Source tracingaims to identify or verify the source of a spoofing attack given an unknown utterancex. In anidentificationsetting, a systemF I predicts the source as ˆk= arg max k∈AID FI (x)k,(1) whereA ID denotes the set of in-distribution (seen or known) attacks. In theclosed-setcase, th...

work page

[3] [3]

Database We conduct experiments primarily on the recent, publicly available STOPA [14] dataset

EXPERIMENTAL SETUP, RESULTS AND DISCUSSION 3.1. Database We conduct experiments primarily on the recent, publicly available STOPA [14] dataset. It contains699k spoofed utterances from13 attack systems, formed by combining8acoustic models (AMs) and6vocoder models (VMs). Each utterance is labeled with its attack id as well as AM and VM ids, enabling multi-l...

work page 2019

[4] [4]

Specifically, we enhanced SSL-AASIST embed- dings with AAM loss and incorporated out-of-domain data to im- prove variability and robustness

CONCLUSION We addressed a realistic and challengingopen-set, zero-shot source tracingscenario. Specifically, we enhanced SSL-AASIST embed- dings with AAM loss and incorporated out-of-domain data to im- prove variability and robustness. In zero-shot tracing, cosine sim- ilarity generalized best to unseen attacks, while few-shot backends (MLP, Siamese) prov...

work page

[5] [5]

Audio deepfake detection: A survey,

Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, and Yan Zhao, “Audio deepfake detection: A survey,”arXiv preprint arXiv:2308.14970, 2023

work page arXiv 2023

[6] [6]

A survey on speech deepfake detection,

Menglu Li, Yasaman Ahmadiadli, and Xiao-Ping Zhang, “A survey on speech deepfake detection,”ACM Computing Sur- veys, vol. 57, no. 7, pp. 1–38, 2025

work page 2025

[7] [7]

Source tracing of audio deepfake systems,

Nicholas Klein, Tianxiang Chen, Hemlata Tak, Ricardo Casal, and Elie Khoury, “Source tracing of audio deepfake systems,” arXiv preprint arXiv:2407.08016, 2024

work page arXiv 2024

[8] [8]

Source tracing: detecting voice spoofing,

Tinglong Zhu, Xingming Wang, Xiaoyi Qin, and Ming Li, “Source tracing: detecting voice spoofing,” in2022 Asia- Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022, pp. 216– 220

work page 2022

[9] [9]

Audio deepfake source tracing using multi-attribute open-set identification and verification,

Pierre Falez, Tony Marteau, Damien Lolive, and Arnaud Del- hay, “Audio deepfake source tracing using multi-attribute open-set identification and verification,” inProc. Interspeech 2025, 2025, pp. 1528–1532

work page 2025

[10] [10]

Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion,

Ajinkya Kulkarni, Sandipana Dowerah, Tanel Alum ¨ae, and Mathew Magimai Doss, “Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion,” inInterspeech 2025, 2025, pp. 1533– 1537

work page 2025

[11] [11]

TADA: Training-free Attribution and Out-of-Domain Detec- tion of Audio Deepfakes,

Adriana Stan, David Combei, Dan Oneata, and Horia Cucu, “TADA: Training-free Attribution and Out-of-Domain Detec- tion of Audio Deepfakes,” inInterspeech 2025, 2025, pp. 1543–1547

work page 2025

[12] [12]

Open-Set Source Tracing of Audio Deepfake Systems,

Nicholas Klein, Hemlata Tak, and Elie Khoury, “Open-Set Source Tracing of Audio Deepfake Systems,” inInterspeech 2025, 2025, pp. 1578–1582

work page 2025

[13] [13]

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incre- mental Learning Method for Audio Deepfake Source Tracing,

Yang Xiao and Rohan Kumar Das, “Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incre- mental Learning Method for Audio Deepfake Source Tracing,” inInterspeech 2025, 2025, pp. 1563–1567

work page 2025

[14] [14]

VIB- based Real Pre-emphasis Audio Deepfake Source Tracing,

Thien-Phuc Doan, Kihun Hong, and Souhwan Jung, “VIB- based Real Pre-emphasis Audio Deepfake Source Tracing,” in Interspeech 2025, 2025, pp. 1568–1572

work page 2025

[15] [15]

Synthetic Speech Source Trac- ing using Metric Learning,

Dimitrios Koutsianos, Stavros Zacharopoulos, Yannis Pana- gakis, and Themos Stafylakis, “Synthetic Speech Source Trac- ing using Metric Learning,” inInterspeech 2025, 2025, pp. 1558–1562

work page 2025

[16] [16]

Source Verification for Speech Deepfakes ,

Viola Negroni, Davide Salvi, Paolo Bestagini, and Stefano Tubaro, “ Source Verification for Speech Deepfakes ,” inIn- terspeech 2025, 2025, pp. 1548–1552

work page 2025

[17] [17]

Deep residual learning for image recognition,

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016

[18] [18]

STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution,

Anton Firc, Manasi Chhibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, and Kamil Malinka, “STOPA: A Dataset of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution,” inInterspeech 2025, 2025, pp. 1553–1557

work page 2025

[19] [19]

Investigating self- supervised front ends for speech spoofing countermeasures,

Xin Wang and Junichi Yamagishi, “Investigating self- supervised front ends for speech spoofing countermeasures,” arXiv preprint arXiv:2111.07725, 2021

work page arXiv 2021

[20] [20]

Arcface: Additive angular margin loss for deep face recogni- tion,

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou, “Arcface: Additive angular margin loss for deep face recogni- tion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4690–4699

work page 2019

[21] [21]

Momentum contrast for unsupervised visual represen- tation learning,

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Gir- shick, “Momentum contrast for unsupervised visual represen- tation learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729– 9738

work page 2020

[22] [22]

Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,

Xin Wang, Junichi Yamagishi, Massimiliano Todisco, H ´ector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidul- lah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, et al., “Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,”Computer Speech & Lan- guage, vol. 64, pp. 101114, 2020

work page 2019

[23] [23]

Automatic speaker verification spoofing and deep- fake detection using wav2vec 2.0 and data augmentation

Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, and Nicholas Evans, “Auto- matic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation,”arXiv preprint arXiv:2202.12233, 2022

work page arXiv 2022