DeePen: Penetration Testing for Audio Deepfake Detection
Pith reviewed 2026-05-23 02:40 UTC · model grok-4.3
The pith
Audio deepfake detectors can be deceived by simple signal processing attacks like time-stretching and echo addition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective.
What carries the argument
DeePen, a black-box penetration testing approach that applies a set of signal processing attacks to probe vulnerabilities in deepfake detectors without prior knowledge of the models.
If this is right
- All tested deepfake detection systems can be reliably deceived by basic manipulations such as time-stretching or echo addition.
- Retraining detection systems with knowledge of specific attacks can mitigate some vulnerabilities but not others.
- Production systems and academic models alike show these weaknesses under black-box testing.
Where Pith is reading between the lines
- Detectors may require ongoing adaptation to new attack vectors beyond the tested set.
- Black-box testing like this could become standard for evaluating security of media detection tools.
- Alternative detection strategies, such as those based on different features, might be needed to achieve robustness.
Load-bearing premise
The carefully selected set of signal processing modifications is sufficient to expose meaningful vulnerabilities in deepfake detection models in a black-box setting.
What would settle it
A deepfake detector that maintains high accuracy on audio modified by all the DeePen attacks would disprove the universal vulnerability claim.
Figures
read the original abstract
Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen. Our approach operates without prior knowledge of or access to the target deepfake detection models. Instead, it leverages a set of carefully selected signal processing modifications - referred to as attacks - to evaluate model vulnerabilities. Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DeePen, a black-box penetration testing methodology for audio deepfake detection models. It applies a fixed set of signal-processing attacks (e.g., time-stretching, echo addition) without model access or prior knowledge, evaluates both production systems and academic checkpoints, and claims that all tested detectors are vulnerable; some attacks can be mitigated by retraining while others remain effective.
Significance. If the attack set is shown to be chosen independently and the empirical results are reproducible, the work would usefully document concrete weaknesses in deployed detectors and the limits of simple augmentation-based defenses.
major comments (3)
- [Abstract / §4] Abstract and §4 (evaluation): no quantitative success rates, attack-selection criteria, or experimental protocol are stated, so it is impossible to assess whether the reported vulnerabilities are load-bearing or merely narrow sensitivities.
- [§3] §3 (attack design): the claim that the signal-processing modifications were selected without knowledge of the target models must be supported by an explicit, a-priori list and justification; otherwise the black-box robustness conclusion is circular.
- [§5] §5 (mitigation experiments): retraining with attack knowledge presupposes white-box access that is unavailable in the black-box phase; the asymmetry must be justified or the two phases cannot be compared directly.
minor comments (2)
- [§4] Add a table listing the exact attacks, their parameters, and per-model success rates.
- [§3] Clarify whether any of the listed attacks overlap with standard data-augmentation pipelines already used by the detectors.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive comments. We respond to each major comment below and will make revisions to improve clarity, provide explicit details, and strengthen the presentation of our black-box methodology without altering the core claims.
read point-by-point responses
-
Referee: [Abstract / §4] Abstract and §4 (evaluation): no quantitative success rates, attack-selection criteria, or experimental protocol are stated, so it is impossible to assess whether the reported vulnerabilities are load-bearing or merely narrow sensitivities.
Authors: We agree that the abstract is high-level and does not include specific quantitative success rates or attack-selection criteria. Section 4 reports results across production systems and academic checkpoints with attacks such as time-stretching and echo addition, but the experimental protocol and selection criteria can be stated more explicitly. We will revise the abstract to include representative quantitative success rates and expand §4 with a dedicated subsection on the experimental protocol and attack-selection criteria to enable assessment of the results' generality. revision: yes
-
Referee: [§3] §3 (attack design): the claim that the signal-processing modifications were selected without knowledge of the target models must be supported by an explicit, a-priori list and justification; otherwise the black-box robustness conclusion is circular.
Authors: The attacks were selected as standard, widely used signal-processing operations (time-stretching, echo addition, and similar) drawn from general audio processing literature, without reference to any target detector. To remove any ambiguity, we will add to the revised §3 an explicit enumerated list of all attacks together with a-priori justification based solely on their effects on audio signals, independent of the models later evaluated. revision: yes
-
Referee: [§5] §5 (mitigation experiments): retraining with attack knowledge presupposes white-box access that is unavailable in the black-box phase; the asymmetry must be justified or the two phases cannot be compared directly.
Authors: The black-box phase evaluates detectors with zero model access or knowledge. The mitigation experiments are a separate analysis that assumes only that the defender knows the attack type (not model internals) and can augment training data accordingly; this does not require white-box access. We will revise §5 to add an explicit justification of this distinction, clarifying that the two phases address different questions and are not intended to be compared under identical access assumptions. revision: partial
Circularity Check
No circularity: empirical black-box testing with external targets
full rationale
The paper introduces DeePen as a black-box penetration testing methodology that applies a fixed set of signal-processing modifications to evaluate existing deepfake detectors. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the abstract or described claims. The central results consist of empirical observations on real-world production systems and public academic checkpoints; success or failure of the attacks is measured against those external models rather than being constructed from the paper's own inputs. The methodology is therefore self-contained as an evaluation procedure without reduction to its own fitted values or prior author results.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations
The RADAR Challenge 2026 provides a multilingual benchmark for audio deepfake detection under media transformations and finds that robust performance remains an open problem.
-
RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations
RADAR Challenge 2026 describes a benchmark with over 100,000 multilingual utterances and media transformations for audio deepfake detection, reporting results from 22 teams that highlight ongoing robustness issues.
Reference graph
Works this paper leans on
-
[1]
F. Biadsy, R. J. Weiss, P. J. Moreno, D. Kanvesky, and Y . Jia, “Par- rotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation,” in Proc. Interspeech 2019 , 2019, pp. 4115–4119
work page 2019
-
[2]
Apple, “Apple introduces new features for cognitive accessibility, along with Live Speech, Personal V oice, and Point and Speak in Magnifier,” https://www.apple.com/newsroom/2023/05/ apple-previews-live-speech-personal-voice-and-more-new-accessibility-features/, 2023, Accessed: 17.10.2024
work page 2023
-
[3]
How deepfake videos are used to spread disinformation - the new york times,
“How deepfake videos are used to spread disinformation - the new york times,” https://www.nytimes.com/2023/02/07/technology/ artificial-intelligence-training-deepfake.html, (Accessed: 16.10.2024)
work page 2023
-
[4]
Explicit ai-generated images of taylor swift circulate; can she sue for defamation?
“Explicit ai-generated images of taylor swift circulate; can she sue for defamation?” https://www.scbc-law.org/post/ explicit-ai-generated-images-of-taylor-swift-circulate-can-she-sue-for-defamation, (Accessed: 16.10.2024)
work page 2024
-
[5]
Opinion — deepfake porn sites used her image. she’s fighting back. - the new york times,
“Opinion — deepfake porn sites used her image. she’s fighting back. - the new york times,” https://www.nytimes.com/2024/04/08/opinion/ deepfake-porn-tech.html, (Accessed: 16.10.2024)
work page 2024
-
[6]
A voice deepfake was used to scam a ceo out of $243,000,
“A voice deepfake was used to scam a ceo out of $243,000,” https://www.forbes.com/sites/jessedamiani/2019/09/03/ a-voice-deepfake-was-used-to-scam-a-ceo-out-of-243000/, (Accessed: 16.10.2024)
work page 2019
-
[7]
“Finance worker pays out $25 million after video call with deep- fake ‘chief financial officer’ — cnn,” https://edition.cnn.com/2024/02/ 04/asia/deepfake-cfo-scam-hong-kong-intl-hnk/index.html, (Accessed: 16.10.2024)
work page 2024
-
[8]
“NSE CEO deepfake: NSE urges caution after fake videos of CEO Ashish Chauhan recommending stocks go viral - The Economic Times,” https://economictimes.indiatimes.com/markets/stocks/ news/beware-of-deepfake-of-ceo-recommending-stocks-says-nse/ articleshow/109189329.cms?from=mdr, (Accessed: 16.10.2024)
-
[9]
A deepfake video showing volodymyr zelenskyy surrendering worries experts : Npr,
“A deepfake video showing volodymyr zelenskyy surrendering worries experts : Npr,” https://www.npr.org/2022/03/16/1087062648/ deepfake-video-zelenskyy-experts-war-manipulation-ukraine-russia, (Accessed: 16.10.2024)
work page 2022
-
[10]
ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale,
X. Wang, H. Delgado, H. Tak, J.-w. Jung, H.-j. Shim, M. Todisco, I. Kukanov, X. Liu, M. Sahidullah, T. Kinnunen, N. Evans, K. A. Lee, and J. Yamagishi, “ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale,” inASVspoof Workshop 2024 (accepted), 2024
work page 2024
-
[11]
Add 2023: the second audio deepfake detection challenge,
J. Yi, J. Tao, R. Fu, X. Yan, C. Wang, T. Wang, C. Y . Zhang, X. Zhang, Y . Zhao, Y . Renet al., “Add 2023: the second audio deepfake detection challenge,” IJCAI 2023 Workshop on Deepfake Audio Detection (DADA 2023), 2023
work page 2023
-
[12]
Create a replica of your voice that sounds just like you,
Eleven Labs, “Create a replica of your voice that sounds just like you,” https://elevenlabs.io/voice-cloning, 2024, Accessed: 17.10.2024
work page 2024
-
[13]
Respeecher, “AI V oice Cloning,” https://www.respeecher.com/ ai-voice-cloning, 2024, Accessed: 17.10.2024
work page 2024
-
[14]
AI V oice Cloning: Clone your V oice in Seconds,
Resemble AI, “AI V oice Cloning: Clone your V oice in Seconds,” https: //www.resemble.ai/voice-cloning/, 2024, Accessed: 17.10.2024
work page 2024
-
[15]
J.-w. Jung, H.-s. Heo, j.-h. Kim, H.-j. Shim, and H.-j. Yu, “Rawnet: Advanced end-to-end deep neural network using raw waveforms for text- independent speaker verification,” Proc. Interspeech , pp. 1268–1272, 2019
work page 2019
-
[16]
Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks,
J. Jung, H. Heo, H. Tak, H. Shim, J. Chung, B. Lee, H. Yu, and N. Evans, “Aasist: Audio anti-spoofing using integrated spectro-temporal graph attention networks,” in ICASSP , IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , 2022, pp. 2405–2409
work page 2022
-
[17]
Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detection,
W. Ge, J. Patino, M. Todisco, and N. Evans, “Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 22–28
work page 2021
-
[18]
Replay detection using cqt-based modified group delay feature and resnewt network in asvspoof 2019,
X. Cheng, M. Xu, and T. F. Zheng, “Replay detection using cqt-based modified group delay feature and resnewt network in asvspoof 2019,” in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , 2019, pp. 540–545
work page 2019
-
[19]
Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,
Z. Wu, E. S. Chng, and H. Li, “Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition,” in Interspeech 2012, 2012, pp. 1700–1703
work page 2012
-
[20]
A comparative study on recent neural spoofing countermeasures for synthetic speech detection,
X. Wang and J. Yamagishi, “A comparative study on recent neural spoofing countermeasures for synthetic speech detection,” in Interspeech 2021, 2021, pp. 4259–4263
work page 2021
-
[21]
Stc antispoofing systems for the asvspoof2021 challenge,
A. Tomilov, A. Svishchev, M. V olkova, A. Chirkovskiy, A. Kondratev, and G. Lavrentyeva, “Stc antispoofing systems for the asvspoof2021 challenge,” in 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, 2021, pp. 61–67
work page 2021
-
[22]
End-to-end anti-spoofing with rawnet2,
H. Tak, J. Patino, M. Todisco, A. Nautsch, N. Evans, and A. Larcher, “End-to-end anti-spoofing with rawnet2,” in IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP) , 2021, pp. 6369–6373
work page 2021
-
[23]
O. Pascu, A. Stan, D. Oneata, E. Oneata, and H. Cucu, “Towards gen- eralisable and calibrated audio deepfake detection with self-supervised representations,” in Interspeech 2024, 2024, pp. 4828–4832
work page 2024
-
[24]
Improved DeepFake Detection Using Whisper Features,
P. Kawa, M. Plata, M. Czuba, P. Szyma ´nski, and P. Syga, “Improved DeepFake Detection Using Whisper Features,” in Proc. INTERSPEECH 2023, 2023, pp. 4009–4013
work page 2023
-
[25]
Adapter learning from pre-trained model for robust spoof speech detection,
H. Wu, W. Guo, S. Peng, Z. Li, and J. Zhang, “Adapter learning from pre-trained model for robust spoof speech detection,” in Interspeech 2024, 2024, pp. 2095–2099
work page 2024
-
[26]
Exploring green AI for audio deepfake detection,
S. Saha, M. Sahidullah, and S. Das, “Exploring green AI for audio deepfake detection,” CoRR, vol. abs/2403.14290, 2024
-
[27]
H. Tak, M. Todisco, X. Wang, J. weon Jung, J. Yamagishi, and N. Evans, “Automatic Speaker Verification Spoofing and Deepfake Detection Us- ing Wav2vec 2.0 and Data Augmentation,” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022) , 2022, pp. 112–119
work page 2022
-
[28]
Does audio deepfake detection generalize?
N. M. M ¨uller, P. Czempin, F. Dieckmann, A. Froghyar, and K. B¨ottinger, “Does audio deepfake detection generalize?” in Interspeech, 2022, pp. 2783–2787
work page 2022
-
[29]
The impact of silence on speech anti-spoofing,
Y . Zhang, Z. Li, J. Lu, H. Hua, W. Wang, and P. Zhang, “The impact of silence on speech anti-spoofing,” IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 31, pp. 3374–3389, 2023
work page 2023
-
[30]
Speech is silver, silence is golden: What do asvspoof- trained models really learn?
N. M. M ¨uller, F. Dieckmann, P. Czempin, R. Canals, and K. B ¨ottinger, “Speech is silver, silence is golden: What do asvspoof-trained models really learn?” ArXiv, vol. abs/2106.12914, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:235624055
-
[31]
Analyzing the impact of splicing artifacts in partially fake speech signals,
V . Negroni, D. Salvi, P. Bestagini, and S. Tubaro, “Analyzing the impact of splicing artifacts in partially fake speech signals,” arXiv preprint arXiv:2408.13784, 2024
-
[32]
L. Wang, L. Yu, Y . Zhang, and H. Xie, “Generalizable speech spoofing detection against silence trimming with data augmentation and multi- task meta-learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3296–3310, 2024
work page 2024
-
[33]
Improving copy-synthesis anti-spoofing training method with rhythm and speaker perturbation,
J. Lu, Y . Zhang, Z. Li, Z. Shang, W. Wang, and P. Zhang, “Improving copy-synthesis anti-spoofing training method with rhythm and speaker perturbation,” in Interspeech 2024, 2024, pp. 512–516
work page 2024
-
[34]
X. Wang and J. Yamagishi, “Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?” Submitted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024) , 4 2024. [Online]. Available: http://arxiv.org/pdf/2309.06014v1
-
[35]
J. M. Mart ´ın-Do˜nas, A. ´Alvarez, E. Rosello, A. M. Gomez, and A. M. Peinado, “Exploring self-supervised embeddings and synthetic data augmentation for robust audio deepfake detection,” in Interspeech 2024, 2024, pp. 2085–2089
work page 2024
-
[36]
H. Tak, M. Kamble, J. Patino, M. Todisco, and N. Evans, “Rawboost: A raw data boosting and augmentation method applied to automatic speaker verification anti-spoofing,” in ICASSP 2022-2022 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6382–6386
work page 2022
-
[37]
H. Tak, M. Todisco, X. Wang, J.-w. Jung, J. Yamagishi, and N. Evans, “Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation,” in The Speaker and Language Recognition Workshop, 2022
work page 2022
-
[38]
Self-supervised dataset pruning for efficient training in audio anti-spoofing,
A. H. Azeemi, I. A. Qazi, and A. A. Raza, “Self-supervised dataset pruning for efficient training in audio anti-spoofing,” in INTERSPEECH 2023, 2023, pp. 2773–2777
work page 2023
-
[39]
Spoofing attack augmentation: Can differently-trained attack models improve gen- eralisation?
W. Ge, X. Wang, J. Yamagishi, M. Todisco, and N. Evans, “Spoofing attack augmentation: Can differently-trained attack models improve gen- eralisation?” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2024, pp. 12 531– 12 535
work page 2024
-
[40]
M. Panariello, W. Ge, H. Tak, M. Todisco, and N. Evans, “Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems,” in INTERSPEECH 2023, 2023, pp. 2868–2872
work page 2023
-
[41]
Advshadow: Evading deepfake detection via adversarial shadow attack,
J. Liu, M. Zhang, J. Ke, and L. Wang, “Advshadow: Evading deepfake detection via adversarial shadow attack,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 4640–4644
work page 2024
-
[42]
M. Todisco, M. Panariello, X. Wang, H. Delgado, K.-A. Lee, and N. Evans, “Malacopula: adversarial automatic speaker verification at- tacks using a neural-based generalised hammerstein model,” in Proc. ASVspoof Workshop 2024, 2024
work page 2024
-
[43]
ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,
M. Todisco, X. Wang, V . Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. H. Kinnunen, and K. A. Lee, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” in Proc. Interspeech 2019 , 2019, pp. 1008–1012
work page 2019
-
[44]
Deep residual neural networks for audio spoofing detection,
M. Alzantot, Z. Wang, and M. B. Srivastava, “Deep residual neural networks for audio spoofing detection,” in Interspeech 2019, 2019, pp. 1078–1082
work page 2019
-
[45]
Does audio deepfake detection generalize?
N. M ¨uller, P. Czempin, F. Diekmann, A. Froghyar, and K. B ¨ottinger, “Does audio deepfake detection generalize?” in Interspeech 2022, 2022, pp. 2783–2787
work page 2022
-
[46]
Attack agnostic dataset: Towards gen- eralization and stabilization of audio deepfake detection,
P. Kawa, M. Plata, and P. Syga, “Attack agnostic dataset: Towards gen- eralization and stabilization of audio deepfake detection,” in Interspeech 2022, 2022, pp. 4023–4027
work page 2022
-
[47]
H. Tak, J. weon Jung, J. Patino, M. Kamble, M. Todisco, and N. Evans, “End-to-end spectro-temporal graph attention networks for speaker ver- ification anti-spoofing and speech deepfake detection,” in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Counter- measures Challenge, 2021, pp. 1–8
work page 2021
-
[48]
Complex-valued neural networks for voice anti-spoofing,
N. M. M ¨uller, P. Sperl, and K. B ¨ottinger, “Complex-valued neural networks for voice anti-spoofing,” in INTERSPEECH 2023 , 2023, pp. 3814–3818
work page 2023
-
[49]
One-class knowledge distillation for spoofing speech detection,
J. Lu, Y . Zhang, W. Wang, Z. Shang, and P. Zhang, “One-class knowledge distillation for spoofing speech detection,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
work page 2024
-
[50]
Samo: Speaker attractor multi-center one-class learning for voice anti-spoofing,
S. Ding, Y . Zhang, and Z. Duan, “Samo: Speaker attractor multi-center one-class learning for voice anti-spoofing,” inICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5
work page 2023
-
[51]
Mlaad: The multi- language audio anti-spoofing dataset,
N. M. M ¨uller, P. Kawa, W. H. Choong, E. Casanova, E. G ¨olge, T. M ¨uller, P. Syga, P. Sperl, and K. B ¨ottinger, “Mlaad: The multi- language audio anti-spoofing dataset,” International Joint Conference on Neural Networks (IJCNN) , 2024
work page 2024
-
[52]
SpeechT5: Unified- modal encoder-decoder pre-training for spoken language processing,
J. Ao, R. Wang, L. Zhou, C. Wang, S. Ren, Y . Wu, S. Liu, T. Ko, Q. Li, Y . Zhang, Z. Wei, Y . Qian, J. Li, and F. Wei, “SpeechT5: Unified- modal encoder-decoder pre-training for spoken language processing,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Dublin, Ireland: Association for...
work page 2022
-
[53]
Xtts: a massively multilingual zero-shot text-to-speech model,
E. Casanova, K. Davis, E. G ¨olge, G. G ¨oknar, I. Gulea, L. Hart, A. Aljafari, J. Meyer, R. Morais, S. Olayemi, and J. Weber, “Xtts: a massively multilingual zero-shot text-to-speech model,” in Interspeech 2024, 2024, pp. 4978–4982
work page 2024
-
[54]
Better speech synthesis through scaling,
J. Betker, “Better speech synthesis through scaling,” arXiv preprint arXiv:2305.07243, 2023
-
[55]
MUSAN: A Music, Speech, and Noise Corpus
D. Snyder, G. Chen, and D. Povey, “MUSAN: A Music, Speech, and Noise Corpus,” 2015, arXiv:1510.08484v1
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[56]
Free music archive - instrumental,
F. M. Archive, “Free music archive - instrumental,” https:// freemusicarchive.org/genre/Instrumental/, 2024, accessed: 10.10.2024
work page 2024
-
[57]
ESC: Dataset for Environmental Sound Classification,
K. J. Piczak, “ESC: Dataset for Environmental Sound Classification,” in Proceedings of the 23rd Annual ACM Conference on Multimedia . ACM Press, 2015, pp. 1015–1018. [Online]. Available: http://dl.acm. org/citation.cfm?doid=2733373.2806390
-
[58]
J. Wilczek, “Simple auto-tune in python,” https://github.com/ JanWilczek/python-auto-tune, 2023, accessed: 10.10.2024
work page 2023
-
[59]
Robert, “Pydub,” https://github.com/jiaaro/pydub, 2024, accessed: 10.10.2024
J. Robert, “Pydub,” https://github.com/jiaaro/pydub, 2024, accessed: 10.10.2024
work page 2024
-
[60]
librosa: Audio and music signal analysis in python
B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python.” in SciPy, 2015, pp. 18–24
work page 2015
-
[61]
DARTS: Differentiable architecture search,
H. Liu, K. Simonyan, and Y . Yang, “DARTS: Differentiable architecture search,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=S1eYHoC5FX
work page 2019
-
[62]
Speaker recognition from raw waveform with sincnet,
M. Ravanelli and Y . Bengio, “Speaker recognition from raw waveform with sincnet,” in 2018 IEEE spoken language technology workshop (SLT). IEEE, 2018, pp. 1021–1028
work page 2018
-
[63]
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
J. Chung, C. Gulcehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[64]
Linear versus mel frequency cepstral coefficients for speaker recognition,
X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, and S. Shamma, “Linear versus mel frequency cepstral coefficients for speaker recognition,” in 2011 IEEE workshop on automatic speech recognition & understanding . IEEE, 2011, pp. 559–564
work page 2011
-
[65]
Robust speech recognition via large-scale weak supervi- sion,
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervi- sion,” in International conference on machine learning . PMLR, 2023, pp. 28 492–28 518
work page 2023
-
[66]
MesoNet: a Compact Facial Video Forgery Detection Network,
D. Afchar, V . Nozick, J. Yamagishi, and I. Echizen, “MesoNet: a Compact Facial Video Forgery Detection Network,” in 2018 IEEE International Workshop on Information Forensics and Security (WIFS) , 2018, pp. 1–7
work page 2018
-
[67]
wav2vec 2.0: A framework for self-supervised learning of speech representations,
A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, pp. 12 449– 12 460, 2020
work page 2020
-
[68]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[69]
“X,” https://x.com/chai ste/status/1757717290865283282, (Accessed: 16.10.2024). attack→ Add Background Music Add Background Noise Amplitude Modulation Autotune Bit Depth Change Echo Equalization Freq Minus Freq Plus Gaussian Noise High Pass Filter Low Pass Filter MP3 Compression Pitch Shift Reverb Silence Injection Time Stretch No Attack Mean adaptive def...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.