Recognition: no theorem link
If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models
Pith reviewed 2026-05-13 18:50 UTC · model grok-4.3
The pith
Transferability analysis of minimal sufficient audio signals reveals information-theoretic differences between models that accuracy metrics miss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a minimal sufficient signal for a classification on model f, transferability analysis determines whether other models accept this signal as having the same classification. Applied to three tasks, it shows varying transferability rates and identifies models with unique behaviors in deepfake detection that indicate underlying information processing differences.
What carries the argument
Transferability analysis, which checks if a minimal sufficient signal from one model is accepted by others for the same classification.
If this is right
- Music genre sufficient signals transfer successfully in approximately 26% of cases.
- Emotion recognition and deepfake detection show higher variance in transferability rates.
- Some deepfake detection models, called flat-earther models, display different transferability behavior.
- Transferability analysis uncovers information theoretic differences between models not captured by accuracy and precision.
Where Pith is reading between the lines
- Models with low transferability might rely on different acoustic features than those with high transferability.
- Transferability could guide selection of models for applications where robustness to signal variations matters.
- Further analysis might reveal specific audio features that cause non-transfer in certain models.
Load-bearing premise
A minimal sufficient signal for one model's classification can be reliably identified and its transfer to other models reflects meaningful differences in information processing.
What would settle it
An experiment showing that all models accept the same minimal sufficient signals at similar rates would challenge the claim that transferability reveals unique information differences.
Figures
read the original abstract
In order to gain fresh insights about the information processing characteristics of different audio classification models, we propose transferability analysis. Given a minimal, sufficient signal for a classification on a model $f$, transferability analysis asks whether other models accept this minimal signal as having the same classification as it did on $f$. We define what it means for a sufficient signal to be transferable and perform a large study over $3$ different classification tasks: music genre, emotion recognition and deepfake detection. We find that transferability rates vary depending on the task, with sufficient signals for music genre being transferable $\approx26\%$ of the time. The other tasks reveal much higher variance in transferability and reveal that some models, in particular on deepfake detection, have different transferability behavior. We call these models `flat-earther' models. We investigate deepfake audio in more depth, and show that transferability analysis also allows to us to discover information theoretic differences between the models which are not captured by the more familiar metrics of accuracy and precision.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes transferability analysis as a method to probe information-processing differences among audio classification models. Given a minimal sufficient signal for a classification decision on model f, the approach tests whether other models assign the same label to that signal. Experiments across music genre classification, emotion recognition, and deepfake detection report task-dependent transfer rates (approximately 26% for music genre) and identify a subset of deepfake models, termed 'flat-earther' models, whose transfer behavior deviates from the others. The authors argue that these transfer patterns expose information-theoretic distinctions not visible in accuracy or precision metrics.
Significance. If the empirical findings are reproducible, the work supplies a practical diagnostic for comparing audio models that goes beyond scalar performance numbers. The deepfake results in particular suggest that transferability can surface model-specific sensitivities to minimal cues, which may inform robustness evaluation and model selection in security-sensitive audio tasks.
major comments (3)
- [Abstract] Abstract and the description of the method: the central claim that transferability rates reveal information-theoretic differences rests on the reliable extraction of minimal sufficient signals, yet no concrete procedure, optimization criterion, or verification step for identifying these signals is supplied. Without this, the reported 26% transfer rate and the distinction drawn for flat-earther models cannot be assessed for robustness or reproducibility.
- [Deepfake experiments] Deepfake detection results: the assertion that transferability uncovers distinctions missed by accuracy and precision requires an explicit comparison (e.g., correlation analysis or controlled ablation) between the two families of metrics; the current presentation leaves open whether the observed variance is an artifact of the signal-minimization process rather than a genuine information-theoretic signal.
- [Method definition] Definition of transferability: the paper states that it 'defines what it means for a sufficient signal to be transferable,' but the operational test (threshold on model output, agreement metric, or statistical test) is not specified. This definition is load-bearing for all quantitative claims and must be stated formally before the empirical rates can be interpreted.
minor comments (1)
- [Abstract] Abstract contains the grammatical error 'allows to us to discover'; correct to 'allows us to discover'.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important areas for improving clarity and reproducibility. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Abstract] Abstract and the description of the method: the central claim that transferability rates reveal information-theoretic differences rests on the reliable extraction of minimal sufficient signals, yet no concrete procedure, optimization criterion, or verification step for identifying these signals is supplied. Without this, the reported 26% transfer rate and the distinction drawn for flat-earther models cannot be assessed for robustness or reproducibility.
Authors: We agree that the current manuscript lacks sufficient detail on the extraction of minimal sufficient signals. In the revised version, we will add a dedicated subsection describing the concrete optimization procedure (including the loss function and constraints used to minimize the signal while preserving the classification decision on model f), the specific algorithm employed, and verification steps such as ablation checks and sensitivity analysis to confirm minimality and sufficiency. This will enable readers to reproduce and assess the robustness of the reported transfer rates and model distinctions. revision: yes
-
Referee: [Deepfake experiments] Deepfake detection results: the assertion that transferability uncovers distinctions missed by accuracy and precision requires an explicit comparison (e.g., correlation analysis or controlled ablation) between the two families of metrics; the current presentation leaves open whether the observed variance is an artifact of the signal-minimization process rather than a genuine information-theoretic signal.
Authors: We will strengthen this section by adding an explicit comparison between transferability patterns and standard metrics. Specifically, we will include a correlation analysis across models between transfer rates and accuracy/precision values, along with a controlled ablation that varies the signal-minimization parameters while tracking both metric families. This will demonstrate that the observed distinctions for flat-earther models are not artifacts of the minimization process. revision: yes
-
Referee: [Method definition] Definition of transferability: the paper states that it 'defines what it means for a sufficient signal to be transferable,' but the operational test (threshold on model output, agreement metric, or statistical test) is not specified. This definition is load-bearing for all quantitative claims and must be stated formally before the empirical rates can be interpreted.
Authors: We acknowledge that the operational definition requires formalization. In the revision, we will state the definition explicitly, including the precise threshold applied to model output probabilities for label agreement, the agreement metric (e.g., exact label match or probabilistic divergence), and any statistical tests used to determine transferability. This formalization will be placed early in the methods section to support all subsequent quantitative claims. revision: yes
Circularity Check
Empirical study with no circular derivations
full rationale
The paper defines transferability analysis upfront as an empirical procedure: given a minimal sufficient signal for model f, check whether other models accept the same classification. Results are obtained from a large-scale study across three tasks (music genre, emotion recognition, deepfake detection), reporting observed transfer rates and identifying 'flat-earther' models in deepfake detection. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains. The information-theoretic differences are presented as direct observations from varying transferability behavior, not derived from prior work by the same authors or renamed known results. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Minimal sufficient signals can be identified for a given model's classification decision
invented entities (1)
-
flat-earther models
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Lerch,An introduction to audio content analysis: Music Information Retrieval tasks and applications
A. Lerch,An introduction to audio content analysis: Music Information Retrieval tasks and applications. John Wiley & Sons, 2022
work page 2022
-
[2]
A simple method to determine if a music information retrieval system is a “horse
B. L. Sturm, “A simple method to determine if a music information retrieval system is a “horse”, ”IEEE Transactions on Multimedia, vol. 16, no. 6, pp. 1636–1644, 2014
work page 2014
-
[4]
Causal identification of sufficient, necessary and complete explanations in image classification,
——, “Causal identification of sufficient, necessary and complete explanations in image classification, ”arXiv preprint arXiv:2507.23497, 2025
-
[5]
I am big, you are little; i am right, you are wrong,
D. A. Kelly, A. Chanchal, and N. Blake, “I am big, you are little; i am right, you are wrong, ” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 817–826
work page 2025
-
[6]
J. Y. Halpern,Actual Causality. The MIT Press, 2019
work page 2019
-
[7]
I guess that’s why they call it the blues: Causal analysis for audio classifiers,
D. A. Kelly and H. Chockler, “I guess that’s why they call it the blues: Causal analysis for audio classifiers, ”arXiv preprint arXiv:2601.16675, 2026
-
[8]
Pearl,Causality: Models, Reasoning and Inference, 2nd ed
J. Pearl,Causality: Models, Reasoning and Inference, 2nd ed. USA: Cambridge University Press, 2009
work page 2009
-
[9]
Abduction-based explanations for machine learning models,
A. Ignatiev, N. Narodytska, and J. Marques-Silva, “Abduction-based explanations for machine learning models, ” inThe Thirty-Third AAAI Conference on Artificial Intelligence, AAAI. AAAI Press, 2019, pp. 1511–1519
work page 2019
-
[10]
Multiple different explanations for image classifiers,
H. Chockler, D. A. Kelly, and D. Kroening, “Multiple different explanations for image classifiers, ” inECAI European Conference on Artificial Intelligence, 2025
work page 2025
-
[11]
Activation-deactivation: A general framework for robust post-hoc explainable ai,
A. Chanchal, D. A. Kelly, and H. Chockler, “Activation-deactivation: A general framework for robust post-hoc explainable ai, ”arXiv preprint arXiv:2510.01038, 2025
-
[12]
A. Oppenheim and R. Schafer,Discrete-Time Signal Processing. Pearson Deutschland, 2013. [Online]. Available: https://elibrary.pearson.de/book/99. 150005/9781292038155
work page 2013
-
[13]
Causal explanations for image classifiers,
H. Chockler, D. A. Kelly, D. Kroening, and Y. Sun, “Causal explanations for image classifiers, ”Journal of Artificial Intelligence Research, 2026
work page 2026
-
[14]
The ryerson audio-visual database of emotional speech and song (ravdess),
S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess), ” Apr. 2018. [Online]. Available: https://doi.org/10.5281/zenodo.1188976
-
[15]
C. Fusion and W. Cukierski, “Gtzan, ” https://www.kaggle.com/datasets/ andradaolteanu/gtzan-dataset-music-genre-classification, 2011, kaggle
work page 2011
-
[16]
The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use,
B. L. Sturm, “The gtzan dataset: Its contents, its faults, their effects on evaluation, and its future use, ”arXiv preprint arXiv:1306.1461, 2013
-
[17]
Asvspoof 2019: A large-scale public database of synthesized, converted and replayed speech,
X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidul- lah, V. Vestman, T. Kinnunen, K. A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. W. g, S. L. Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. M. ka, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka...
work page 2019
-
[18]
Does audio deepfake detection generalize?
N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?”Interspeech, 2022
work page 2022
-
[19]
Transformers: State-of-the-art natural language processing,
T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-art natural language processing, ” inProceedings of the 2020 Conference on Empirical Meth...
work page 2020
-
[20]
Binary codes capable of correcting deletions, insertions, and reversals,
V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals, ”Soviet physics. Doklady, vol. 10, pp. 707–710, 1965. [Online]. Available: https://api.semanticscholar.org/CorpusID:60827152
work page 1965
-
[21]
An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,
J. Jensen and C. H. Taal, “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers, ”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016
work page 2009
-
[22]
Ai got your tongue? analysing the sounds of audio deepfake genera- tion methods,
K. Schäfer, “Ai got your tongue? analysing the sounds of audio deepfake genera- tion methods, ” inProceedings of the 2025 International Conference on Multimedia Retrieval, ser. ICMR ’25. New York, NY, USA: Association for Computing Machinery, 2025, p. 2023–2027
work page 2025
-
[23]
Does audio deepfake detection generalize?
N. M. Müller, P. Czempin, F. Dieckmann, A. Froghyar, and K. Böttinger, “Does audio deepfake detection generalize?” inInterspeech, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247793039
work page 2022
-
[24]
On success and simplicity: A second look at transferable targeted attacks,
Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks, ” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 6115–6128
work page 2021
-
[25]
Autonomous language-image generation loops converge to generic visual motifs,
A. Hintze, F. Proschinger Åström, and J. Schossau, “Autonomous language-image generation loops converge to generic visual motifs, ”Patterns, vol. 7, no. 1, p. 101451, 2026
work page 2026
-
[26]
L. Wang, J. Ao, L. Gan, Y. Wang, X. Zhang, and Z. Wu, “Audio deepfake verification, ” 2025. [Online]. Available: https://arxiv.org/abs/2509.08476
-
[27]
Pitch imperfect: Detecting audio deepfakes through acoustic prosodic analysis,
K. Warren, D. Olszewski, S. Layton, K. Butler, C. Gates, and P. Traynor, “Pitch imperfect: Detecting audio deepfakes through acoustic prosodic analysis, ” 2025. [Online]. Available: https://arxiv.org/abs/2502.14726
-
[28]
Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks,
A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita-Rotaru, and F. Roli, “Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks, ” in28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 321–
work page 2019
-
[29]
Available: https://www.usenix.org/conference/usenixsecurity19/ presentation/demontis
[Online]. Available: https://www.usenix.org/conference/usenixsecurity19/ presentation/demontis
-
[30]
On success and simplicity: A second look at transferable targeted attacks,
Z. Zhao, Z. Liu, and M. Larson, “On success and simplicity: A second look at transferable targeted attacks, ”Advances in Neural Information Processing Systems, vol. 34, pp. 6115–6128, 2021
work page 2021
-
[31]
Beyond boundaries: A comprehensive survey of transferable attacks on ai systems,
G. Wang, C. Zhou, Y. Wang, B. Chen, H. Guo, and Q. Yan, “Beyond boundaries: A comprehensive survey of transferable attacks on ai systems, ”arXiv preprint arXiv:2311.11796, 2023
-
[32]
Audio adversarial examples: Targeted attacks on speech-to-text,
N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks on speech-to-text, ” in2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 1–7
work page 2018
-
[33]
Sirenattack: Generating adver- sarial audio for end-to-end acoustic systems,
T. Du, S. Ji, J. Li, Q. Gu, T. Wang, and R. Beyah, “Sirenattack: Generating adver- sarial audio for end-to-end acoustic systems, ” inProceedings of the 15th ACM Asia conference on computer and communications security, 2020, pp. 357–369
work page 2020
-
[34]
Targeted adversarial examples for black box audio systems,
R. Taori, A. Kamsetty, B. Chu, and N. Vemuri, “Targeted adversarial examples for black box audio systems, ” in2019 IEEE security and privacy workshops (SPW). IEEE, 2019, pp. 15–20
work page 2019
-
[35]
Transferable adversarial attacks on audio deepfake detection,
M. U. Farooq, A. Khan, K. Uddin, and K. Malik, “Transferable adversarial attacks on audio deepfake detection, ” 01 2025
work page 2025
-
[36]
A. López and E. García-Cuesta, “On the transferability of local model-agnostic explanations of machine learning models to unseen data, ” pp. 1–10, 05 2024
work page 2024
-
[37]
On the reasons behind decisions,
A. Darwiche and A. Hirth, “On the reasons behind decisions, ” inECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), ser. Frontiers in Artificial Intelligence a...
work page 2020
-
[38]
K. Elimelech, T. Yaacov, D. Kelly, H. Chockler, and M. Vardi,Explaining Failures of Cyber-Physical Systems with Actual Causality. IEEE, Jun. 2026
work page 2026
-
[39]
Explainable ai for time series classification: A review, taxonomy and research directions,
A. Theissler, F. Spinnato, U. Schlegel, and R. Guidotti, “Explainable ai for time series classification: A review, taxonomy and research directions, ”IEEE Access, vol. 10, 2022
work page 2022
-
[40]
Sig-lime: A signal-based enhancement of lime explanation technique,
T. A. A. Abdullah, M. S. M. Zahid, A. F. Turki, W. Ali, A. A. Jiman, M. J. Abdulaal, N. M. Sobahi, and E. T. Attar, “Sig-lime: A signal-based enhancement of lime explanation technique, ”IEEE Access, vol. 12, pp. 52 641–52 658, 2024
work page 2024
-
[41]
Local interpretable model-agnostic explanations for music content analysis,
S. Mishra, B. L. T. Sturm, and S. Dixon, “Local interpretable model-agnostic explanations for music content analysis, ” inProceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017, 2017, pp. 537–543
work page 2017
-
[42]
Audio explainable artificial intelligence: A review,
A. Akman and B. W. Schuller, “Audio explainable artificial intelligence: A review, ” Intelligent Computing, vol. 2, p. 0074, 2024
work page 2024
-
[43]
Lemons: Listen- able explanations for music recommender systems,
A. B. Melchiorre, V. Haunschmid, M. Schedl, and G. Widmer, “Lemons: Listen- able explanations for music recommender systems, ” inEuropean Conference on Information Retrieval. Springer, 2021, pp. 531–536
work page 2021
-
[44]
audiolime: Listenable explanations using source separation,
V. Haunschmid, E. Manilow, and G. Widmer, “audiolime: Listenable explanations using source separation, ”arXiv preprint arXiv:2008.00582, 2020
-
[45]
Musiclime: Explainable multimodal music understanding,
T. Sotirou, V. Lyberatos, O. M. Mastromichalakis, and G. Stamou, “Musiclime: Explainable multimodal music understanding, ” 2025. [Online]. Available: https://arxiv.org/abs/2409.10496
-
[46]
Classification accuracy is not enough: On the evaluation of music genre recognition systems,
B. L. Sturm, “Classification accuracy is not enough: On the evaluation of music genre recognition systems, ”Journal of Intelligent Information Systems, vol. 41, no. 3, pp. 371–406, 2013
work page 2013
-
[47]
A fourier explanation of ai-music artifacts,
D. Afchar, G. Meseguer-Brocal, K. Akesbi, and R. Hennequin, “A fourier explanation of ai-music artifacts, ” inInternational Society for Music Information Retrieval Conference, 2025. [Online]. Available: https://api.semanticscholar.org/ CorpusID:280000343
work page 2025
-
[48]
Out-of-the-box: Black-box Causal Attacks on Object Detectors
M. Navaratnarajah, D. A. Kelly, and H. Chockler, “Out-of-the-box: Black-box causal attacks on object detectors, ”arXiv preprint arXiv:2512.03730, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [49]
-
[50]
The multiBERTs: BERT reproductions for robustness analysis,
T. Sellam, S. Yadlowsky, I. Tenney, J. Wei, N. Saphra, A. D’Amour, T. Linzen, J. Bastings, I. R. Turc, J. Eisenstein, D. Das, and E. Pavlick, “The multiBERTs: BERT reproductions for robustness analysis, ” inInternational Conference on Learning Representations, 2022. [Online]. Available: https: //openreview.net/forum?id=K0E_F0gFDgA
work page 2022
-
[51]
D. Picard, “Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision, ”CoRR, vol. abs/2109.08203, 2021. [Online]. Available: https://arxiv.org/abs/2109.08203
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.